VoIP & Encryption is the result of encapsulating the transmission of the VoIP protocol packets and the accompanying audio packets into some type of encryption method, such as TLS (Transport Layer Security). In our case, we use the most common VoIP protocol – SIP (Session Initiation Protocol) and the media method – RTP (Real-time Transfer Protocol).

(To view the video version of this walk-through, visit https://youtu.be/XMjXixv7h28)

To be fair, one can be encrypted without the other but essentially renders the other useless or vulnerable beyond what you might deem to be “a good idea”. For example;

An admin decides to encrypt the SIP packets but not the audio – A malicious network user can now sniff out the audio packets from all of your conversations and play them back. Just as bad – the attacker can also capture DTMF (touch tone) sounds over the network and capture credit card and account data.

Although they wouldn’t be able to view the details of the phone calls from the packets themselves, what’s the point really?

In reverse – Our admin encrypts the audio packets but not the SIP packets. You might be saying, “well the conversation is more important isn’t it?”. Although I would tend to agree there is still information the attacker can obtain in order to carry out other types of attacks.

These would include;

Discovering the IP of the PBX: The PBX could now be targeted for entry or DDoS

Discovering the User device IP: Handset could be targeted for DDoS.
If the IP phone uses POE it might be daisy chained through the users PC and that would also be vulnerable. The IP could also be used for future identification of the user.

As the last examples and most important: The phone extension data could be used to spoof calls and the username/password combo can be sniffed for a complete device hijack!

As mentioned above, the common Encryption used for SIP is the TLS protocol (SIP/TLS). Such is that the encryption has the benefits and limitations of TLS and any security vulnerabilities that may come with it. This is also why it’s important to stay up-to date on TLS issues such as the Heartbleed Bug (https://xkcd.com/1354/) , and changing encryption from v1.0/1.1 to 1.2.

For the audio packet encapsulation, we use what’s called DTLS-SRTP (https://en.wikipedia.org/wiki/Datagram_Transport_Layer_Security) – Secure Realtime Transport Protocol (as you may have guessed) which is also based off of the TLS protocol.

There is also another project called ZRTP (https://en.wikipedia.org/wiki/ZRTP), where the Z comes from “Zimmerman” whom created the PGP project as well. Although this method was created in 2006 there isn’t as wide an adoption as SRTP likely due to the lack of endpoints that support it.

Let’s look at  some packet comparisons from Wireshark

Un-encrypted SIP Call Packet
Insecure SIP Packet
Insecure SIP Packet. Notice the full call details.
Insecure Call Flow in Wireshark
Un-encrypted SIP Call-Flow
Encrypted Call using SIP/TLS
Secure SIP Call Packet
Secured Call Full. Notice the absence of the call details.
Secure SIP Call Flow
Secure SIP Call-Flow. Can’t capture the call details.

Now what about audio (RTP)?

Un-encrypted Audio Capture
Un-encrypted RTP Audio
Un-encrypted RTP Audio. Fully captured and re-playable.
Encrypted Audio with SRTP
Secure RTP Packet
Secure SRTP Packet. Cant decipher the audio.

Being that SIP/TLS and SRTP are natively built into most all SIP devices I have seen in the last 10 years, and even ready to go in projects such as asterisk now, there is little to no excuse not to use it. For added security you can also choose a SIP Provider like that offers Encrypted Calling as well.