Quality of Service (QoS) in VoIP Networks
As discussed earlier, VoIP is most commonly deployed over converged IP networks carrying data, voice and video traffic. When network resources are congested they can severely affect the quality of VoIP traffic causing poor user experience for the subscribers. This can result in increased customer calls (trouble tickets) for the Voice SP and loss of revenue due to customer turnover.
Therefore, it is very important for the Voice SP or an Enterprise to implement QoS for VoIP traffic in their networks. This can help guarantee good voice quality when network resources are congested.
There are a number of factors that can affect the quality of VoIP traffic as perceived by the end user. Some of the common factors include delay, jitter and packet loss. These factors can be key indicators of the overall health of the voice network and are defined as follows:
-
Delay: The time it takes the VoIP traffic to reach from one endpoint to another is typically referred to as the end-to-end delay. Delay can be measured in either one-way or round-trip delay. The ITU G.114 recommendation states that the acceptable one-way delay for voice is 150 ms. Any delay > 150 ms can result in degraded voice quality and poor user experience.
-
Jitter: It is the variation in delay over time from one endpoint to another. If the delay of transmissions varies too widely in a VoIP call, the call quality is greatly degraded. VOIP network typically compensates for this by having jitter buffers at the endpoints, to deliver the VoIP traffic to the end user at a constant rate. If the jitter it too high it can overflow the jitter buffer at the endpoints resulting in packet loss and poor voice quality.
-
Packet loss: It is the number of dropped packets in the data path while carrying the VoIP traffic from one endpoint to another. A 3 percent packet loss is typically regarded as the maximum tolerable limit for good voice quality. The VoIP network should be design for < 1.5% packet loss in order to guarantee good voice quality.
This section does not cover the various methods of configuring and troubleshooting QoS in order to prevent delay, jitter and packet loss in VoIP networks. It describes (at a high level) the methodology of how to use these key indicators to implement and manage a QoS policy in the network. This can help the Voice SP or an Enterprise isolate problems in the network more effectively and prevent them from happening in the future.
Defining a QoS Methodology
The QoS policy implemented for VoIP traffic should encompass the end-to-end voice network. It is recommended to take a layered QoS approach which makes it easier to implement and manage the QoS policy for VoIP.
The QoS policy for VoIP traffic should cover Layer 2, Layer 3 as well as the application layer. This will help guarantee that the VoIP traffic is given preferential treatment as it is transported from one endpoint to another. QoS at the application layer is especially useful when end users are using PC-based VoIP applications to place and receive voice calls. In this case, the VoIP traffic may receive the desired QoS as it traverses the network but the end user’s PC-based application may not prioritize VoIP over other applications demanding CPU resources. This can result in poor voice quality due to delay, jitter or packet loss as described above.
One thing to keep in mind is that QoS may only help when resources are congested. If there is no contention for bandwidth or other network resources then applying QoS may not provide any additional benefits.
Differentiated Services (Diff Serv) for Applying QoS
A good QoS policy involves marking or classifying the VoIP traffic at the edge of the network so that intermediate devices in the network can differentiate voice traffic from other traffic and process them according to the defined policy. This marking or classification can be done using Differentiated Services Code Point (DSCP) values or by using the IP Precedence bits in the Type of Service (ToS) byte in the IP header.
Diff Serv defines the required behavior in the forwarding path to provide quality of service for different classes of traffic. A very important aspect in the definition of forwarding path behavior for Qos is the method of doing packet classification. Packet classification is required for quality of service in order to determine which treatment a particular packet will get for shared resource allocation.
The Diff Serv model also defines boundaries of trust in a network and the associated functions that occur at the edges of a region of trust. A DSCP specifies a Per Hop Behavior (PHB) for forwarding treatment. A PHB specifies a scheduling treatment that packets marked with the DSCP will receive. A PHB can also include a specification for traffic conditioning. Traffic conditioning functions include traffic shaping and policing. Traffic shaping conditions traffic to meet a particular average rate and burst requirement. Policing enforces an average rate and burst requirement. Actions to take when traffic exceeds a policing specification can include remarking or drop.
The PHB commonly used for voice bearer traffic is 46, also known as the Expedited Forwarding (EF) PHB. The PHB commonly used for call signaling is 26, also known as the Assured Forwarding 31 (AF31) PHB.
Figure 6-11 illustrates a Differentiated Services based QoS model.
In SP environments, endpoints are typically untrusted devices. This means that endpoints may not mark or classify the VOIP traffic correctly therefore this traffic would need to be re-marked and re-classified at the edge of the network. Once the VoIP traffic is re-classified at the edge, it can be scheduled into appropriate queues and receive the desired QoS.
In Enterprise networks, endpoints such as IP Phones are considered trusted devices while PC-based soft clients are conditionally trusted. Trusted devices are supposed to classify the VoIP traffic correctly while traffic from conditionally trusted devices is only trusted if it meets a defined criteria. This criteria is typically defined at access-layer switches which are directly connected to these conditionally trusted devices. If a device is compromised and starts sending mis-classified VoIP traffic, it can be policed at the edge of the trust boundary and put into a scavenger queue. This queue can be monitored periodically to discover any undesired network activity and the data can be used for trending to predict failures as well as linked to the trouble ticketing system to correlate to any network issues.
Using Bandwidth / Resource Reservation and Call Admission Control (CAC) for Providing QoS
Another approach for providing QoS to VoIP traffic is to reserve the required network resources before setting up the voice call and using CAC for rejecting calls which may not be able to receive the desired QoS due to congestion or high utilization of network resources. While this approach can definitely guarantee QoS to VoIP traffic, it does have it disadvantages.
One of the problems with using this approach is that network resources need to be reserved end-to-end to guarantee QoS to VoIP traffic from one endpoint to another. This can be very challenging since resources may not be available on certain network segments due to congestion, which will result in the call setup to fail. This also means that once the resources are reserved they cannot be used for any other traffic, hence network resources may not be efficiently utilized.
Even with the downsides mentioned above, this approach is still used in some deployment models in SP environments. The approach is slightly modified though, to make better use of network resources. Instead of reserving network resources ahead of time, they are only reserved when a voice call needs to be setup and are released once the voice call is torn down. This enables more efficient use of network resources as they can be used for other traffic if not being utilized for VoIP traffic. This approach is more preferred especially in cases where VoIP is deployed in converged networks.
Managing QoS
QoS management helps to set and evaluate QoS policies and goals. A common methodology entails the following steps:
- Establishing network baseline. This helps in determining the traffic characteristics of the network.
- Deploying QoS techniques when the traffic characteristics have been obtained and an application(s) has been targeted for QoS.
- Evaluating the results by testing the response of the targeted applications to see whether the QoS goals have been reached.
In order to effectively manage QoS policies in a VoIP network, it is important to use a layered approach. Information needs to be gathered from different points in the network and at various layers (Layer 1, 2, 3 and the application). This information needs to be correlated to different events occurring in the network such as degraded voice service in certain network segments or complete voice outage in a specific location.
It is very important to establish a baseline for the voice endpoints as well. For instance, a baseline can be established for PacketCable MTAs based on their state (In- service, Out-of-Service etc.), registration status (registered, unregistered), and so on. So if a mass de-registration occurs this can be correlated to a provisioning server failure or if a large number of MTAs go into Out-of-Service state this event can be correlated to a CMS failure.
For monitoring QoS, look for PHB as defined in the Diff Serv model for QoS. Look for QoS policy violations, queue drops, interface statistics, errors and resource over-utilization (memory, CPU) on routers, switches, voice gateways and endpoints. This information can be correlated to alarms and syslog messages stored on management servers.
The above mentioned information can be gathered using command line interface (CLI) or by polling via SNMP or XML as mentioned in earlier sections of this chapter. In order to poll information from various network devices, different MIBs can be used. An example of the QoS MIB is given below:
CISCO-CLASS-BASED-QOS-MIB
-
cbQosPoliceExceededBitRate (1.3.6.1.4.1.9.9.166.1.17.1.1.14)- The bit rate of the non-conforming traffic.
-
cbQosQueueingDiscardByteOverflow (1.3.6.1.4.1.9.9.166.1.18.1.1.3)- The upper 32 bit count of octets, associated with this class, that were dropped by queueing.
-
cbQosQueueingDiscardPkt (1.3.6.1.4.1.9.9.166.1.18.1.1.7)- The number of packets, associated with this class, that were dropped by queueing.
-
cbQosTSStatsDropPktOverflow (1.3.6.1.4.1.9.9.166.1.19.1.1.10)- This object represents the upper 32 bits counter of packets that have been dropped during shaping.
-
cbQosTSStatsDropPkt (1.3.6.1.4.1.9.9.166.1.19.1.1.11)- This object represents the lower 32 bits counter of packets that have been dropped during shaping.
If the problem is occurring due to network congestion, this can be diagnosed by monitoring QoS at different network elements and different layers. In order to explain this concept, we take an example of a PacketCable network as discussed in chapter 3.
PacketCable Use Case
In a PacketCable environment quality of service is provided using DQoS architecture which focuses on the access part of the network between the MTA and the CMTS. Resources are assigned to the MTA at the time of call setup after performing admission control, and QoS is assigned based on the information received from the CMS (via gate messaging). If the call setup fails it can be caused due any of the following reasons:
- Lack of resources on the MTA.
- Layer 2 messaging getting dropped between the MTA and the CMTS. This can be caused due to Layer 1 events such as noise on the cable plant or Layer 2 events such as DOCSIS queues filling up.
- Lack of resources on the CMTS. This can be either at the DOCSIS layer (Layer 2), at the IP layer (Layer 3) or at upper layer protocols like COPS (used for carrying DQoS messages between the CMTS and the CMS).
- Call signaling failure due to network congestion, causing delayed or dropped packets by intermediate devices between the MTA and the CMS.
Similarly, if the quality of the voice call is degraded after being setup, the problem could be related to the following issues:
- Packet drops between the MTA and the CMTS due to physical layer (Layer 1) issues (degraded SNR, Uncorrectable errors, etc.)
- Proper QoS not assigned to the voice call. The voice call maybe setup over Best Effort service flows instead of dedicated service flow with guaranteed QoS for voice.
- The voice service flows may be getting impacted due to resource over-utilization (high CPU utilization, DOCSIS scheduler issues etc.) on the CMTS. This can cause voice packets to get delayed or dropped on the service flows.
- Packets getting dropped by intermediate device between the two VoIP endpoints (Layer 3).
The layered approach for monitoring the above mentioned issues is illustrated in Figure 6-12.
In the approach mentioned above, we start at Layer 1 by monitoring the physical parameters of the cable plant like Signal-to-Noise Ratio (SNR), power levels, correctable and uncorrectable errors caused due to noise. These parameters can be monitored by using the DOCS-IF-MIB. If there are issues at the physical layer that can affect VoIP traffic (degraded SNR, power levels, errors etc.) we correlate this data to network events or alarms to see if they are causing any VoIP related issues.
Next we look at the DOCSIS layer (Layer 2) to see if the DOCSIS layer messaging between the MTA and CMTS is working as expected. We would need to look at the DSX messages (Dynamic Service Add – DSA, Dynamic Service Change – DSC and Dynamic Service Delete – DSD) to ensure that requests being sent by the MTA are not being rejected or dropped by the CMTS. Failure in DSX messaging would also need to be correlated to any VoIP events in the network to make sure service is not getting impacted. The DSX messaging on the CMTS can be monitored by using the DOCS-QOS-MIB.
Next we look at the DOCSIS QoS parameters on the CMTS to make sure the VoIP traffic is getting the appropriate QoS when it is transported over the cable network. This information can be monitored using the DOCS-QOS-MIB.
The next thing we look at is the IP layer (Layer 3) information to make sure that packet drops under the Cable (RF) interfaces or the WAN links are not affecting VoIP traffic. We would also look at the queues under these interfaces to make sure packets are not backing up in the queues which can cause delay and jitter for VoIP traffic. The interface statistics can be monitored using the IF-MIB.
Another thing to check on the intermediate devices between the MTA and the other VoIP endpoint is the QoS policy defined to make sure it is operating as designed. This information can be monitored using the CISCO-CLASS-BASED-QOS-MIB as described above.
Additional checkpoints could include any other devices such as firewalls, layer 2 switches etc. to make sure they are not interfering with the quality of the VoIP traffic. One thing to check on layer 2 devices is to make sure they are configured with the appropriate Class of Service (CoS) for the VoIP signaling and bearer traffic.
Lastly, we also need to look at the information from signaling protocol (MGCP/NCS) counters from the endpoints to track delay, jitter and packet loss. This can also help explain issues contributing to degradation of the voice quality. These performance counters can also be used for trend analysis and capacity planning as explained chapter 8.
So far we have explained the high-level methodology to monitor QoS and correlate this information to network events to help isolate problems more effectively. The details of this approach are explained in chapter 7.
As mentioned earlier in this chapter, it is important to group different network elements in the VoIP network (group endpoints, aggregation and core devices, provisioning and management servers, CMS and voice mail servers etc.) to isolate problem caused due to certain device types. This can also help in establishing a baseline for each device type and also for collecting periodic information from these devices which can be used for analysis and trending. This will be discussed in more detail in chapter 8.
Once issues are categorized by different device types or grouping they can be correlated to the trouble ticketing system so problems can be tied back to specific vendors etc. This is discussed in more detail in the next section.





