VoIP Performance Management and Optimization: Managing VoIP Networks

Chapter Description

To ensure voice quality and to optimize media delivery over the IP, it is crucial to properly plan, design, implement, and manage the underlying network. This chapter discusses what are the best practices for planning media deployment over IP networks.

Effectively Monitoring the Network

Effective monitoring of the network is based on the philosophy of proactive or preventative maintenance

Preventative Maintenance as described in this chapter consists of performance monitoring of voice quality metrics.

Every network is prone to faults and problems. Faults may be detected by operations personnel, by users of the service equipment, by performance monitoring and testing within elements, by trend analysis, etc. Paper or electronic trouble reports may be generated and sent; packet based or internal messages may be passed between administrative layers.

  • Bottom-Up trouble shooting process– Fault indication and performance information generated at the endpoint flows upwards through a hierarchy of levels beginning at the bearer element level through the signaling element level to the network management level and finally to reach to system’s operation level.

    Performance and fault information is stored in local log files, remote data bases, Call Detail Records, Call Maintenance Records, performance servers, etc. This stored information may be later analyzed by a top down trouble shooting process originating from the system operations or network management level.

    The endpoint may be capable of preliminarily trouble shooting the fault autonomously, or it may always expect top-down assistance from one or more of its controlling entities.

  • Top-Down trouble shooting process– Faults may be presented to the system’s operational level personnel or automatic systems arising from either lower level indications or from user or maintenance personnel trouble reports, trend analysis reports, etc. At the system’s operation level, service maintenance personnel, and in some cases automatic systems, once presented with an indication of system fault will attempt to further categorize, correlate, isolate, and otherwise trouble shoot the fault symptom from the top down.

Discovery - Complete picture

Network management starts with network discovery. The completeness of coverage and accuracy of device identification is a key to effective network management.

Network discovery must be performed only on stable networks!

The length of time required to discover a network is dependent on various factors that are not necessarily tied to the device population size. For example, one cannot assume that 200 devices can be discovered in 2 hours simply because the initial 100 devices detected in a session might have been discovered in 1 hour. The underlying threads processes of a discovery session may result in requiring 1 hour for the initial 100 devices, and only 10 minutes for the remaining 100.

Typical factors that impact the amount of time to run a discovery process on a network include the following:

  • Link speed: When deploying a discovery session across WAN links, link speed and latency result in slower discovery.

  • Bandwidth controls: Where discovery is set for low speeds, discovery throttles how much traffic itis creating on the network.

  • Device focus: Looking only for specific network elements results in slower discovery time. Firewalls and ACLs and possible blocking of SNMP traffic. This can be mitigated by Ensuring that the security control points allow SNMP and ICMP traffic through. Include a seed device from the other side of the security control point.

Seed Devices for Network Discovery

When selecting seed devices, it is helpful to remember that the start of the discovery process relies on complete Cisco Discovery Protocol (CDP) and/or routing table information from one or several key devices. These devices need not be core devices but should be devices that contain complete routing tables and/or CDP neighbor tables. Aggregate devices, rather than core devices, may be the wiser choice for use as seed devices.

Concern about the effect of SNMP queries on the seed device may be settled by observing the effect on the SNMP process or a test device, running the same version of software, and containing SNMP tables loaded using a test tool to generate IP route or ARP entries.

A good option for seed devices may be a redundant core device (one that is in hot standby with full knowledge of the network routing tables but not actually handling the majority of the network traffic. This will reduce potential SNMP network traffic.)

CDP (Cisco Discovery Protocol) Discovery

A Cisco device that supports this protocol both transmits and listens for CDP messages. As a result each Cisco device is aware of its immediately connected neighbors. The CNC discovery engine collects the CDP information from devices by using SNMP queries to form a list of all neighbors of the queried device. This list contains all devices that have advertised their presence on the network and provides clues about other devices to be discovered.

Limitations of this method include: Not all Cisco devices support CDP, for example Content Networking devices. Some transmission media, such as ATM, do not support CDP. The network administrator may have disabled CDP on some parts of the network, or even the entire network.

Otherwise CDP is the best single method of discovery. CDP does not have to be enabled on all the devices in order for it to work.

Routing Table Discovery

The Routing Table method uses the routing table from seed devices and retrieves the subnet address and subnet mask from the Routing Table MIB. It then compares each subnet against the list of subnets already discovered. If the connection point for the subnet is not found, it then uses SNMP to retrieve the next hop address, then it compares the next hop address with the IP addresses already discovered. If the next hop address is not discovered it is added to a list of devices to be discovered. Routing protocol neighbor lookup is also very effective but will not find Layer 2 devices and only currently supports the OSPF and BGP protocols

ARP Discovery

The ARP method looks at devices discovered and retrieves the ARP table using the “at” address table MIB. This method retrieves the list of all IP addresses the device has in its cache and compares the MAC address to a list of known Cisco MAC address prefixes. If the MAC address prefix matches, the IP address it is added to the list of devices to be discovered.

Although arp discovery is not very efficient for use as a discovery tool. Devices for which ARP entries have timed out will not be discovered. But it becomes useful in finding devices that do not route, do not support CDP, or have CDP disabled.

Routing Protocol - OSPF Discovery

OSPF (Open Shortest Path First) is an Internal Gateway Protocol. If OSPF is active in a network, OSPF Discovery is the preferred method to determine neighbor information. Discovery uses the OSPF MIB information, which maintains a list of all neighbors. This list contains clues for further device discovery.

Routing protocol neighbor lookup is also very effective but will not find Layer 2 devices and only currently supports the OSPF and BGP protocols (the EIGRP protocol does not have a neighbor MIB). Of course, discovery will be ineffective if OSPF is not running on the network. Therefore, it is recommended to use generic route-table instead of table-specific method.

Ping Sweep Discovery

Ping Sweep Discovery generally provides two types of Ping Sweep using either from a specified IP address range or a specified starting IP address. Both methods issue sequential pings to the IP addresses. If an address responds, discovery then attempts to communicate with the device at the IP address by SNMP. If SNMP communication is successful, the device is then considered to be manageable.

The Cisco Network management tools provide the following two Ping Sweep methods:

  • Pingsweep With Hop- This method starts from an IP address and continues pinging new addresses up to the given hop count.

  • Pingsweep Range- This method uses a range of IP address, from the starting IP address to the ending IP address for the given IP address and netmask.

This is not a very efficient method but given enough time and if ICMP messaging is not blocked, it will find everything on the network. Depending on how network devices have been configured, and the addressing scheme in the network, the network may start responding with ICMP unreachable and redirect messages. Also, proxy ARP may cause problems by falsely representing certain IP addresses.

Seed Files

Seed files contain explicit device credentials including their IP addresses, SNMP community strings, passwords to log into command line interface (CLI), or access other API’s such as XML. Seed files guarantee complete device access leading to full discovery and manageability of the network. Even though loading seed file will result into quick network discovery, creation of seed file (generally by a network) administrator can itself be a time consuming process and is prone to human errors.

It is recommended that multiple methods for discovery are employed for greater coverage. The results of the discovery process should be verified with the network administrator to ensure accuracy of the network topology as discovered by the network management system (NMS).

Voice Quality Metrics

Media has always been predisposed to quality degradation as it traverses through any network including both circuit switched and packet switched networks. IP Networks offers greater flexibility to manage media stream and supplementary applications in additions to economical advantages compared to dedicated circuit switched networks. However packet switched or IP networks may further exacerbate some problems or introduce newer issues that need to be managed. There are multiple of factors working work independently and in concert to degrade the perceived quality of the voice signal as contained in RTP (Real Time Protocol) packets traverse from point A to point B on an IP network.

  • Environmental issues: Environmental issues include acoustic problems caused by handset, headset, Analog to digital convert, and impedance mismatch Other issues may result from poor cabling or network clock synchronizations resulting into crackled voice, clicking sound or presence of crosstalk.

  • Signal processing: Signal processing include Speech compression, Voice Activity Detention, Silence suppression and Signal gain variations

    Issues related to Voice Activity Detection (VAD), including front-end clipping and incorrect comfort noise levels, presence of static, hissing and often underwater sond.
  • VoIP network issues: IP network introduces significant propagation and serialization delays compared to circuit switched networks. This creates network jitter, packet loss often requiring error concealment procession. Delay laso makes inherently present echo more perceiveable to human ears. Delay, jitter and packet loss is manageable through proper implementation of QoS end to end in the IP network. But if this is not managed properly it may result into robotic or synthetic voice due to periods of silence (caused by packet drops) or choppy voice.

Metrics are needed for sustaining a Toll quality Voice network. There are three comprehensive groupings of quality metrics namely the Mean Opinion Score (MOS), the Perceptual Speech Quality Measurement (PSQM), and the Perceptual Evaluation of Speech Quality (PESQ).

MOS or K-factor

MOS is a subjective measure or voice quality. An MOS score is generated when listeners evaluate prerecorded sentences that are subject to varying conditions, such as compression algorithms. Listeners then assign the scores to the received voice signal, based on a scale from 1 to 5, where 1 is the worst and 5 is the best. The test scores are then averaged to a composite score. The tests are also relative, because a score of 3.8 from one test cannot be directly compared to a score of 3.8 from another test. Therefore, a baseline needs to be established for all tests, such as G.711, so that the scores can be normalized and compared directly.

In order for Call Agent or IP PBX such as BTS10200 or Cisco Unified Communications Manager to calculate equivalent of MOS scor a computerized method is has been adopted by Cisco Engineering called K-factor. K-factor (klirrfaktor = ‘clarity measure’ in German) is a clarity, or MOS-LQ (listening quality) estimator. It is an predicted MOS score based entirely on impairments due to frame loss and codec. K-factor does not include any impairment due to delay, or channel factors (echo, levels). K-factor MOS scores are produced on a running basis, with each new MOS estimate based on the previous 8-10s of frame loss data. That is, each k-factor MOS score is VALID over the past 8 seconds. The computation of new scores can be performed at any rate (every second, for example, with the score based on the past 8 second), but the computation window of the MOS is a constant. This was the call agent of UC Manager is able to provide a meaningful value for voice quality in rather objective manner in call detail records.

PSQM

PSQM is an automated method of measuring speech quality “in service,” or as the speech happens. The PSQM measurement is made by comparing the original transmitted speech to the resulting speech at the far end of the transmission channel. PSQM systems are deployed as in-service components. The PSQM measurements are made during real conversation on the network. This automated testing algorithm has over 90 percent accuracy compared to subjective listening tests, such as MOS. Scoring is based on a scale from 0 to 6.5, where 0 is the best and 6.5 is the worst. Because it was originally designed for circuit-switched voice, PSQM does not take into account the jitter or delay problems that are experienced in packet-switched voice systems.

PSQM software usually resides with IP call-management systems, which are sometimes integrated into Simple Network Management Protocol (SNMP) systems.

PESQ

PESQ is current standard for voice quality measurement and documented in ITU Standard P.862. PESQ is the most comprehensive voice quality metric because it can take into account CODEC errors, filtering errors, jitter problems, and delay problems that are typical in a VoIP network. PESQ combines the best of the PSQM method along with a method called Perceptual Analysis Measurement System (PAMS). PESQ scores range from 1 (worst) to 4.5 (best), with 3.8 considered “toll quality” (that is, acceptable quality in a traditional telephony network). PESQ is meant to measure only one aspect of voice quality. The effects of two-way communication, such as loudness loss, delay, echo, and sidetone, are not reflected in PESQ scores.

Many equipment vendors offer PESQ measurement systems. Such systems are either stand-alone or they plug into existing network management systems. PESQ was designed to mirror the MOS measurement system. So, if a score of 3.2 is measured by PESQ, a score of 3.2 should be achieved using MOS methods.

PESQ measures the effect of end-to-end network conditions, including CODEC processing, jitter, and packet loss. Therefore, PESQ is the preferred method of testing voice quality in an IP network. When this metric is vailable on a call processing system via XML, SNMP or CDR’s it should be used for monitoring voice quality

Approach to measure Jitter, Latency and Packet Loss in the network

The above mentioned voice quality metrics provide an overall picture of the perceived voice quality. It is still important to look at the individual factors that impact the voice quality including jitter, latency, packet loss, and various aspects of voice signal such as signal strength and the bandwidth. This section explains approaches to measure these parameters in the VoIP network interfacing TDM network (PLMN).

Round-trip Delay Measurement

The MOS readings indicate the speech transmission quality or the listening clarity. The delay has no impact on the MOS readings although it affects the real phone conversation in following ways:

  • Long delay affects the natural conversation interactivity, and causes hesitation and over-talk. A caller starts noticing delay when the round trip delay exceeds 150ms. ITU-T G.114[9] specifies the maximum desired round-trip delay as 300 ms. A delay over 500 ms will make the phone conversation impractical.
  • Long delay exacerbates echo problems. An echo with level of -30 dB would not be “audible” if the delay is less than 30 ms. But if the delay is over 300 ms, even a -50 db echo is audible. The echo delay and level requirements are specified in ITU-T G.131[10].

Voice Jitter/Frame Slip Measurements

A frame slip or voice jitter is defined as a sudden delay variation at the audio signal side. The audio signal requires continuous and synchronous play out. The packet-switched network is inherently jittery where each packet will arrive asynchronously and may be out of order. To compensate for the jittery nature of the packet-switched (IP) network jitter buffers are used on voice gateways or MTA’s. A large jitter buffer can minimize packet loss, but will induce longer delay. To balance the conflicting need for shorter delay and less packet loss, the jitter buffer may be dynamically re-sized depending on the network traffic situation. Whenever the jitter buffer re-sizes, the audio signal will experience a sudden delay variation (jitter or frame slip) in an amount (in ms) that matches the voice frame size (6, 10 or 30 ms). This test should measure two types of frame slips:

  • Positive (+) frame slip: the total amount of compressive jitters (shortening of delays) that correspond to the down-sizing of jitter buffer or the deletion of packets.
  • Negative (−) frame slip: the total amount of expansive jitters (lengthening of delays) that correspond to the up-sizing of the jitter buffer or the insertion of packets.

A good system should maintain a total amount of jitter less than 3% of test duration. For a 10 second test, the total amount of positive and negative slips measured by SMOS test should be within [-300,300] milliseconds. If SMOS test measures higher amount of jitters, then the network should be re-configured for better traffic engineering and prioritization.

Effective Bandwidth Measurement

A test should measure the attenuation distortion by analyzing the frequency response of the system under test (analyzing the 300-3400Hz band). For PCM (G.711) and ADPCM (G.726) waveform coders, the effective bandwidth largely reflects the attenuation distortion caused by analog or digital filtering. If a system under test uses PCM or ADPCM waveform coders, its measured effective bandwidth should be higher than 0.9. Anything below 0.85 signifies either excessive loop attenuation distortion (for analog circuits) or excessive band-limiting digital filtering. This test may be run during quarterly audits and need not to be enabled on regular basis.

Voice Band Gain Measurement

A test should measure the overall voice band (300 to 3400 Hz) signal level change (attenuation or gain). Flat gain change is not reflected in the MOS reading. But excessive level change (too loud or too faint) does affect human perception. A VoIP network with balanced network loss plan should maintain the change in voice level (gain) in the range of [−10,−3] dB.

Silence Noise level Measurement

The silence noise level in VoIP network measures the comfort noise level generated by CNG (Comfort-Noise-Generator). The level should not be too high (sounds too noisy) nor too low (sounds like a dead line). The noise level is expressed in dBrnC. An ideal system should maintain a silence noise level between [10, 30] dBrnC. Above 30dBrnC sounds too “noisy” and below 10dBrnC may sound too “quiet”.

Voice Clipping

The intention of voice clipping measurement is to quantify the voice quality degradation caused by VADs (voice-Activity-Detectors). VADs help reduce bandwidth requirement though the silence suppression scheme. An overly aggressive VAD, however, can cause the leading or trailing edges of an active signal burst being clipped. The voice clipping will effect modem and fax tone transmission over VoIP network.

Echo Measurements

In VoIP network echo is an inherent issue because of the analog 2-wire loop presence which causes impedance mismatch at the hybrid junction (linking 2 wire analog loop with 4-wire trunk). The echo becomes perceivable due to the network delay. The higher level of echo signal and the significant network delay will make the echo perceivable to human ear. Echo canceller are employed to cancel the echo by covering the tail length to sample the original signal to cancel it out from the reflect signal hence suppressing the echo.

Figure 6-3 shows the minimum requirements for TELR as a function of the mean one-way transmission time T (half the value of the total round trip delay from the talker’s mouth to the talker’s ear). In general, the “acceptable” curve is the one to follow. Only in exceptional circumstances should values for the “limiting case” be allowed otherwise all such cases should be compensated by enabling echo cancellers and properly adjusting tail coverage.

A test equipment should be able to calculate the echo (level against the delay) to characterized the echo present in the network and to evaluate the effectiveness of the after enabling the echo cancellers (ECAN) with the appropriate tail coverage.

Voice Signalling Protocols Impairments in IP Networks

Signalling connections are implemented with protocols allowing for the detection of packet losses, and the re-transmission of lost packets. As such, they are better equipped than voice media connections to survive packet losses. For example, SCCP, used by IP telephones, uses TCP as a transport protocol. Similarly MGCP and SIP implement their own re-transmission scheme as its underlying protocol (UDP) does not provide retransmission services for lost packets. SIP can use either TCP or UDP. In either case, similar to MGCP it has its own retransmission mechanism.

Even though lost packets are retransmitted it is the network conditions that determine the success or failure of the retransmission attempts. Even successful retransmissions can have negative effects if the period needed to complete the transaction (from the initial attempt, through the retransmission attempts, to final success) delays system response by a user-perceivable amount of time. We can classify the IP Communications system behavior according to the relative severity of the interruption to the signalling link connectivity, as follows:

  • Light packet drops, with short duration and low frequency of drops: In this case, the system appears to be generally unresponsive to user input. The user may experience effects such as delayed dial tone, delayed ringer silence upon answer, and double dialing of digits due to the user’s belief that the first attempt was not effective (thus requiring hang-up and redial).

  • More frequent, longer-duration packet drops: In this case, the system alternates between seemingly normal and deteriorated operation. Packet drops cause endpoints to activate link failure measures, including re-initialization. This link interruptions emulated by continous packet drop for long duration can reach the point of causing phone or gateway reset resulting in media tear down as well. Users might experience SRST activation, whereby all active calls are dropped when the link is interrupted and again when the link is reestablished. Phones may also appear unresponsive for several minutes.

  • Complete link interruption: Although most likely caused by an actual network failure, link blackouts could be the result of a congested network where end-to-end QoS is not configured. For instance, a very high degree of packet loss can occur if a signaling link traverses a network path experiencing large, over-provisioned, sustained traffic flows such as network-based storage/disk access, file download, file sharing, or software backup operations. In such cases, the IP Communications system will interrupt calls, and the initiation of backup mechanism, for example survivable remote site telephony SRST in enterprise networks, will provide for continued telephony service for the duration of the link failure. However, the switchover to backup system may be associated with delay where the end points may have to re-register to the alternate system or the advanced telephony features may become unavailable.

These effects apply to all deployment models. However, single-site (campus) deployments tend to be less likely to experience the conditions caused by sustained link interruptions because the larger quantity of bandwidth typically deployed in LAN environments (minimum links of 100 Mbps) allows for some residual bandwidth to be available for the IP Communications system.

In any WAN-based deployment model and any Service Provider managed residential services model, (see Figure 2), traffic congestion is more likely to produce

sustained and/or more frequent link interruptions because the available bandwidth is much less than in

a LAN (typically less than 2 Mbps), so the link is more easily saturated. The effects of link interruptions

impact the users, whether or not the voice media traverses the packet network.

3. How to effectively poll the Network | Next SectionPrevious Section