Designing End-to-End QoS Policies
Cisco has developed many different QoS mechanisms, such as queuing, policing, and traffic shaping, to enable network operators to manage and prioritize the traffic flowing on a network. Applications that are delay sensitive, such as VoIP, require special treatment to ensure proper application functionality.
Classification and Marking
For a flow to have priority, it must be classified and marked. Classification is the process of identifying the type of traffic. Marking is the process of setting a value in the IP header based on the classification. The following are examples of technologies that support classification:
Network-based application recognition (NBAR): This technology uses deep packet content inspection to identify network applications. An advantage of NBAR is that it can recognize applications even when they do not use standard network ports. Furthermore, it matches fields at the application layer. Before NBAR, classification was limited to Layer 4 TCP and User Datagram Protocol (UDP) port numbers.
Committed access rate (CAR): CAR uses a rate limit to set precedence and allows customization of the precedence assignment by user, source or destination IP address, and application type.
Traffic shaping and policing are mechanisms that inspect traffic and take action based on the traffic’s characteristics, such as DSCP or IP precedence bits set in the IP header.
Traffic shaping involves slowing down the rate at which packets are sent out an interface (egress) by matching certain criteria. Traffic shaping uses a token bucket technique to release the packets into the output queue at a preconfigured rate. Traffic shaping helps eliminate potential bottlenecks by throttling back the traffic rate at the source. In enterprise environments, traffic shaping is used to smooth the flow of traffic going out to the provider. This is desirable for several reasons. For example, in provider networks, it prevents the provider from dropping traffic that exceeds the contracted rate.
Policing involves tagging or dropping traffic, depending on the match criteria. Generally, policing is used to set the limit of traffic coming into an interface (ingress) and uses a “leaky bucket mechanism.” Policing can be used to forward traffic based on conforming traffic and to drop traffic that violates the policy. Policing is also referred to as committed access rate (CAR). One example of using policing is giving preferential treatment to critical application traffic by elevating to a higher class and reducing best-effort traffic to a lower-priority class.
When you contrast traffic shaping with policing, remember that traffic shaping buffers packets, while policing can be configured to drop packets. In addition, policing propagates bursts, but traffic shaping does not.
Queuing refers to the buffering process used by routers and switches when they receive traffic faster than it can be transmitted. Different queuing mechanisms can be implemented to influence the order in which the different queues are serviced (that is, how different types of traffic are emptied from the queues).
QoS is an effective tool for managing a WAN’s available bandwidth. Keep in mind that QoS does not add bandwidth; it only helps you make better use of the existing bandwidth. For chronic congestion problems, QoS is not the answer; in such situations, you need to add more bandwidth. However, by prioritizing traffic, you can make sure that your most critical traffic gets the best treatment and available bandwidth in times of congestion. One popular QoS technique is to classify your traffic based on a protocol type or a matching access control list (ACL) and then give policy treatment to the class. You can define many classes to match or identify your most important traffic classes. The remaining unmatched traffic then uses a default class in which the traffic can be treated as best-effort.
Table 9-7 describes QoS options for optimizing bandwidth.
Table 9-7 QoS Options
Identifies and marks flows
Handles traffic overflow using a queuing algorithm
Reduce latency and jitter for network traffic on low-speed links
Traffic shaping and policing
Prevent congestion by policing ingress and egress flows
Two types of output queues are available on routers: the hardware queue and the software queue. The hardware queue uses the first-in, first-out (FIFO) strategy. The software queue schedules packets first and then places them in the hardware queue. Keep in mind that the software queue is used only during periods of congestion. The software queue uses QoS techniques such as priority queuing, custom queuing, weighted fair queuing, class-based weighted fair queuing, low-latency queuing, and traffic shaping and policing.
Priority queuing (PQ) is a queuing method that establishes four interface output queues that serve different priority levels: high, medium, default, and low. Unfortunately, PQ can starve other queues if too much data is in one queue because higher-priority queues must be emptied before lower-priority queues.
Custom queuing (CQ) uses up to 16 individual output queues. Byte size limits are assigned to each queue so that when the limit is reached, CQ proceeds to the next queue. The network operator can customize these byte size limits. CQ is fairer than PQ because it allows some level of service to all traffic. This queuing method is considered legacy due to improvements in the other queuing methods.
Weighted Fair Queuing
Weighted fair queuing (WFQ) ensures that traffic is separated into individual flows or sessions without requiring that you define ACLs. WFQ uses two categories to group sessions: high bandwidth and low bandwidth. Low-bandwidth traffic has priority over high-bandwidth traffic. High-bandwidth traffic shares the service according to assigned weight values. WFQ is the default QoS mechanism on interfaces below 2.0 Mbps.
Class-Based Weighted Fair Queuing
Class-based weighted fair queuing (CBWFQ) extends WFQ capabilities by providing support for modular user-defined traffic classes. CBWFQ lets you define traffic classes that correspond to match criteria, including ACLs, protocols, and input interfaces. Traffic that matches the class criteria belongs to that specific class. Each class has a defined queue that corresponds to an output interface.
After traffic has been matched and belongs to a specific class, you can modify its characteristics, such as by assigning bandwidth and specifying the maximum queue limit and weight. During periods of congestion, the bandwidth assigned to the class is the guaranteed bandwidth that is delivered to the class.
One of the key advantages of CBWFQ is its modular nature, which makes it extremely flexible for most situations. It is often referred to as Modular QoS CLI (MQC), which is the framework for building QoS policies. Many classes can be defined to separate network traffic as needed in the MQC.
Low-latency queuing (LLQ) adds a strict priority queue to CBWFQ. The strict priority queue allows delay-sensitive traffic such as voice to be sent first, before other queues are serviced. That gives voice preferential treatment over the other traffic types. Unlike PQ, LLQ provides for a maximum threshold on the priority queue to prevent lower-priority traffic from being starved by the priority queue.
Without LLQ, CBWFQ would not have a priority queue for real-time traffic. The additional classification of other traffic classes is done using the same CBWFQ techniques. LLQ is the standard QoS method for many VoIP networks.
With Cisco IOS, several link-efficiency mechanisms are available. Link fragmentation and interleaving (LFI), Multilink PPP (MLP), and Real-Time Transport Protocol (RTP) header compression can provide for more efficient use of bandwidth.
Table 9-8 describes Cisco IOS link-efficiency mechanisms.
Table 9-8 Link-Efficiency Mechanisms
Link fragmentation and interleaving (LFI)
Reduces delay and jitter on slower-speed links by breaking up large packet flows and inserting smaller data packets (Telnet, VoIP) in between them.
Multilink PPP (MLP)
Bonds multiple links between two nodes, which increases the available bandwidth. MLP can be used on analog or digital links and is based on RFC 1990.
Real-Time Transport (RTP) header compression
Provides increased efficiency for applications that take advantage of RTP on slow links. Compresses RTP/UDP/IP headers from 40 bytes down to 2–5 bytes.
The window size defines the upper limit of frames that can be transmitted without getting a return acknowledgment. Transport protocols such as TCP rely on acknowledgments to provide connection-oriented reliable transport of data segments. For example, if the TCP window size is set to 8192, the source stops sending data after 8192 bytes if no acknowledgment has been received from the destination host. In some cases, the window size might need to be modified because of unacceptable delay for larger WAN links. If the window size is not adjusted to coincide with the delay factor, retransmissions can occur, which affects throughput significantly. It is recommended that you adjust the window size to achieve better connectivity conditions.