Home > Articles > Cisco Network Technology > IP Communications/VoIP > VoIP: An In-Depth Analysis

VoIP: An In-Depth Analysis

Chapter Description

This chapter explains many of the issues facing Voice over IP (VoIP) and ways in which Cisco addresses these issues.

Pulse Code Modulation

Although analog communication is ideal for human communication, analog transmission is neither robust nor efficient at recovering from line noise. In the early telephony network, when analog transmission was passed through amplifiers to boost the signal, not only was the voice boosted but the line noise was amplified, as well. This line noise resulted in an often-unusable connection.

It is much easier for digital samples, which are comprised of 1 and 0 bits, to be separated from line noise. Therefore, when analog signals are regenerated as digital samples, a clean sound is maintained. When the benefits of this digital representation became evident, the telephony network migrated to pulse code modulation (PCM).

What Is PCM?

As covered in Chapter 1, PCM converts analog sound into digital form by sampling the analog sound 8000 times per second and converting each sample into a numeric code. The Nyquist theorem states that if you sample an analog signal at twice the rate of the highest frequency of interest, you can accurately reconstruct that signal back into its analog form. Because most speech content is below 4000 Hz (4 kHz), a sampling rate of 8000 times per second (125 microseconds between samples) is required.

A Sampling Example for Satellite Networks

Satellite networks have an inherent delay of around 500 ms. This includes 250 ms for the trip up to the satellite, and another 250 ms for the trip back to Earth. In this type of network, packet loss is highly controlled due to the expense of bandwidth. Also, if some type of voice application is already running through the satellite, the users of this service are accustomed to a quality of voice that has excessive delays.

Cisco IOS, by default, sends two 10-ms G.729 speech frames in every packet. Although this is acceptable for most applications, this might not be the best method for utilizing the expensive bandwidth on a satellite link. The simple explanation for wasting bandwidth is that a header exists for every packet. The more speech frames you put into a packet, the fewer headers you require.

If you take the satellite example and use four 10-ms G.729 speech frames per packet, you can cut by half the number of headers you use. Table 7-1 clearly shows the difference between the various frames per packet. With only a 20-byte increase in packet size (20 extra bytes equals two 10 ms G.729 samples), you carry twice as much speech with the packet.

Table 7-1. Frames per Packet (G.729)

G.729 Samples per Frame


Bandwidth Consumed

Latency [*]

Default (two samples per frame)

40 bytes

24,000 bps

25 ms

Satellite (four samples per frame)

40 bytes

16,000 bps

45 ms

Low Latency (one sample per frame)

40 bytes

40,000 bps

15 ms

To reduce the overall IP/RTP/UDP overhead introduced by the 54-byte header, multiple voice samples can be packed into a single Ethernet frame to transmit. Although this can increase the voice delay, increasing this count can improve the overall voice quality, especially when the bandwidth is constrained.

How many voice samples to be sent per frame depends on what codec you choose and the balance between bandwidth utilization and impact of packet loss. The bigger this value, the higher the bandwidth utilization because more voice samples are packed into the payload field of a UDP/RTP packet and thus the network header overhead would be lower. The impact of a packet loss on perceived voice quality will be bigger, however. Table 7-2 lists the values for some of the commonly used codec types.

Table 7-2. Voice Samples per Frame for VoIP Codecs

Codec Type

Voice Samples per Frame (Default)

Voice Samples per Frame (Maximum)
















4. Voice Compression | Next Section Previous Section