Home > Articles > Quality of Service Design Overview

Quality of Service Design Overview

Chapter Description

This chapter provides an overview of the QoS design and deployment process. This process requires business-level objectives of the QoS implementation to be defined clearly and for the service-level requirements of applications to be assigned preferential or deferential treatment so that they can be analyzed.

Principles of QoS Design

The richness of the Cisco QoS toolset allows for a myriad of QoS design and deployment options. However, a few succinct design principles can help simplify strategic QoS designs and lead to an expedited, cohesive, and holistic end-to-end deployment. Some of these design principles are summarized here; others, which are LAN-, WAN- or VPN-specific, are covered in detail in their respective design chapters.

General QoS Design Principles

A good place to begin is to decide which comes first: the cart or the horse. The horse, in this context, serves to pull the cart and is the enabler for this objective. Similarly, QoS technologies are simply the enablers to organizational objectives. Therefore, the way to begin a QoS deployment is not by glossing over the QoS toolset and picking la carte tools to deploy. In other words, do not enable QoS features simply because they exist. Instead, start from a high level and clearly define the organizational objectives.

Some questions for high-level consideration include the following:

  • Is the objective to enable VoIP only?

  • Is video also required? If so, what type(s) of video: interactive or streaming?

  • Are some applications considered mission critical? If so, what are they?

  • Does the organization want to squelch certain types of traffic? If so, what are they?

All traffic classes specified in the QoS Baseline model except one—the Locally-Defined, Mission-Critical Data application class—are determined by objective networking characteristics. These applications, a subset of the Transactional Data class, are selected for a dedicated, preferential class of service because of their significant impact on the organization's main business objectives.

This is usually a highly subjective evaluation that can excite considerable controversy and dispute. An important principle to remember when assigning applications to the Mission-Critical Data class is that as few applications as possible should be assigned to the Locally-Defined Mission-Critical class.

If too many applications are assigned to it, the Mission-Critical Data class will dampen, and possibly even negate, the value of having a separate class (from Transactional Data). For example, if 10 applications are assigned as Transactional Data (because of their interactive, foreground networking characteristics) and all 10 are determined to be classified as Mission-Critical Data, the whole point of a separate class for these applications becomes moot. However, if only one or two of the Transactional Data applications are assigned to the Mission-Critical Data class, the class will prove highly effective.

Related to this point, it is recommended always to seek executive endorsement of the QoS objectives before design and deployment. By its very nature, QoS is a system of managed unfairness and, as such, almost always creates political and organizational repercussions when implemented. To minimize the effects of such nontechnical obstacles to deployment, which could prevent the QoS implementation altogether, it is recommended to address these political and organizational issues as early as possible and to solicit executive endorsement whenever possible.

As stated previously, it is not mandated that enterprises deploy all 11 classes of the QoS Baseline model; this model is designed to be a forward-looking guide for consideration of the many classes of traffic that have unique QoS requirements. Being aware of this model can help bring about a smooth expansion of QoS policies to support additional applications as future requirements arise. However, at the time of QoS deployment, the organization needs to clearly define how many classes of traffic are required to meet the organizational objectives.

This consideration should be tempered with the consideration of how many classes of applications the networking administration team feels comfortable with deploying and supporting. Platform-specific constraints or service-provider constraints also might come into play when arriving at the number of classes of service. At this point, it also would be good to consider a migration strategy to allow the number of classes to be expanded smoothly as future needs arise, as illustrated in Figure 2-9.

Figure 9Figure 2-9 Example Strategy for Expanding the Number of Classes of Service over Time

When the number of classes of service has been determined, the details of the required marking, policing, and queuing policies can be addressed. When deciding where to enable such policies, keep in mind that QoS policies always should be performed in hardware instead of software whenever a choice exists.

Cisco IOS routers perform QoS in software, which places incremental loads on the CPU (depending on the complexity and functionality of the policy). Cisco Catalyst switches, on the other hand, perform QoS in dedicated hardware ASICS and, as such, do not tax their main CPUs to administer QoS policies. This allows complex policies to be applied at line rates at even 1-Gbps or 10-Gigabit speeds.

Classification and Marking Principles

When it comes to classifying and marking traffic, an unofficial Differentiated Services design principle is to classify and mark applications as close to their sources as technically and administratively feasible. This principle promotes end-to-end Differentiated Services and per-hop behaviors (PHBs). Sometimes endpoints can be trusted to set CoS and DSCP markings correctly, but, in most cases, it is not a good idea to trust markings that users can set on their PCs (or other similar devices). This is because users easily could abuse provisioned QoS policies if permitted to mark their own traffic. For example, if DSCP EF receives priority services throughout the enterprise, a user easily could configure the PC to mark all traffic to DSCP EF right on the NIC, thus hijacking network-priority queues to service that user's non-real-time traffic. Such abuse easily could ruin the service quality of real-time applications (such as VoIP) throughout the enterprise. For this reason, the clause "as close as . . . administratively feasible" is included in the design principle.

Following this rule, it further is recommended to use DSCP markings whenever possible because these are end to end, more granular, and more extensible than Layer 2 markings. Layer 2 markings are lost when media changes (such as at a LAN-to-WAN or VPN edge). An additional constraint to Layer 2 marking is that there is less marking granularity; for example, 802.1Q/p CoS supports only 3 bits (values 0 through 7), as does MPLS EXP. Therefore, only (up to) eight classes of traffic can be supported at Layer 2, and interclass relative priority (such as RFC 2597 assured-forwarding class markdown) is not supported. On the other hand, Layer 3 DSCP markings allow for up to 64 classes of traffic, which is more than enough for most enterprise requirements for the foreseeable future.

Because the line between enterprises and service providers is blurring and the need for interoperability and complementary QoS markings is critical, it is recommended to follow standards-based DSCP PHB markings to ensure interoperability and future expansion. The QoS Baseline marking recommendations are standards based, making it easier for enterprises adopting these markings to interface with service provider classes of service. Network mergers are also easier to manage when standards-based DSCP markings are used, whether these mergers are the result of acquisitions, partnerships, or strategic alliances.

Policing and Markdown Principles

There is little sense in forwarding unwanted traffic only to police and drop it at a subsequent node. This is especially the case when the unwanted traffic is the result of DoS or worm attacks. The overwhelming volumes of traffic that such attacks can create readily can drive network device processors to their maximum levels, causing network outages. Therefore, it is recommended to police traffic flows as close to their sources as possible. This principle applies to legitimate flows also because DoS and worm-generated traffic might be masquerading under legitimate, well-known TCP and UDP ports, causing extreme amounts of traffic to be poured onto the network infrastructure. Such excesses should be monitored at the source and marked down appropriately.

Whenever supported, markdown should be done according to standards-based rules, such as RFC 2597 ("Assured Forwarding PHB Group"). In other words, whenever supported, traffic marked to AFx1 should be marked down to AFx2 or AFx3. For example, in the case of a single-rate policer, excess traffic originally marked AF11 should be marked down to AF12. In the case of a dual-rate policer (as defined in RFC 2698), excess traffic originally marked AF11 should be marked down to AF12, and violating traffic should be marked down further to AF13. Following such markdowns, congestion-management policies, such as DSCP-based WRED, should be configured to drop AFx3 more aggressively than AFx2, which, in turn, is dropped more aggressively than AFx1.

However, at the time of writing, Cisco Catalyst switches do not perform DSCP-based WRED, so this standards-based strategy cannot be implemented fully. As an alternative workaround, single-rate policers can be configured to mark down excess traffic to DSCP CS1 (Scavenger); dual-rate policers can be configured to mark down excess traffic to AFx2, while marking down violating traffic to DSCP CS1. Such workarounds yield an overall similar effect as the standards-based policing model. However, when DSCP-based WRED is supported on all routing and switching platforms, it would be more standards compliant to mark down assured-forwarding classes by RFC 2597 rules.

Queuing and Dropping Principles

Critical applications, such as VoIP, require service guarantees regardless of network conditions. The only way to provide service guarantees is to enable queuing at any node that has the potential for congestion—regardless of how rarely, in fact, this might occur. This principle applies not only to campus-to-WAN or VPN edges, where speed mismatches are most pronounced, but also to campus interlayer links (where oversubscription ratios create the potential for congestion). There is simply no other way to guarantee service levels than to enable queuing wherever a speed mismatch exists.

When provisioning queuing, some best-practice rules of thumb also apply. For example, as discussed previously, the Best-Effort class is the default class for all data traffic. Only if an application has been selected for preferential or deferential treatment is it removed from the default class. Because many enterprises have several hundred, if not thousands of, data applications running over their networks, adequate bandwidth must be provisioned for this class as a whole to handle the sheer volume of applications that default to it. Therefore, it is recommended that at least 25 percent of a link's bandwidth be reserved for the default Best-Effort class.

Another class of traffic that requires special consideration when provisioning queuing is the Real-Time or Strict-Priority class (which corresponds to RFC 3246, "An Expedited Forwarding Per-Hop Behavior"). The amount of bandwidth assigned to the Real-Time queuing class is variable. However, if too much traffic is assigned for strict-priority queuing, the overall effect is a dampening of QoS functionality for non-real-time applications.

The goal of convergence cannot be overemphasized: to enable voice, video, and data to coexist transparently on a single network. When real-time applications (such as Voice or Interactive-Video) dominate a link (especially a WAN/VPN link), data applications will fluctuate significantly in their response times, destroying the transparency of the "converged" network.

Cisco Technical Marketing testing has shown a significant decrease in data application response times when real-time traffic exceeds one-third of a link's bandwidth capacity. Extensive testing and customer deployments have shown that a general best queuing practice is to limit the amount of strict-priority queuing to 33 percent of a link's capacity. This strict-priority queuing rule is a conservative and safe design ratio for merging real-time applications with data applications.

Cisco IOS Software allows the abstraction (and, thus, configuration) of multiple (strict-priority) low-latency queues. In such a multiple-LLQ context, this design principle applies to the sum of all LLQs: They should be within one-third of a link's capacity.


This strict-priority queuing rule (limit to 33 percent) is simply a best-practice design recommendation; it is not a mandate. In some cases, specific business objectives cannot be met while holding to this recommendation. In such cases, enterprises must provision according to their detailed requirements and constraints. However, it is important to recognize the trade-offs involved with overprovisioning strict-priority traffic with respect to the negative performance impact on response times in non-real-time applications.

Whenever a Scavenger queuing class is enabled, it should be assigned a minimal amount of bandwidth. On some platforms, queuing distinctions between Bulk Data and Scavenger class traffic flows cannot be made because queuing assignments are determined by CoS values, and these applications share the same CoS value of 1. In such cases, the Scavenger/Bulk Data queuing class can be assigned a bandwidth percentage of 5. If Scavenger and Bulk traffic can be assigned uniquely to different queues, the Scavenger queue should be assigned a bandwidth percentage of 1.

The Real-Time, Best-Effort, and Scavenger classes queuing best-practice principles are illustrated in Figure 12-10.

Figure 10Figure 2-10 Real-Time, Best-Effort, and Scavenger Queuing Rules

Some platforms support different queuing structures than others. To ensure consistent PHBs, configure consistent queuing policies according to platform capabilities.

For example, on a platform that supports only four queues with CoS-based admission (such as a Catalyst switch), a basic queuing policy could be as follows:

  • Real-Time (≤ 33 percent)

  • Critical Data

  • Best-Effort (&ge 25 percent)

  • Scavenger/Bulk(< 5 percent)

However, on a platform that supports a full QoS Baseline queuing model, the queuing policies can be expanded, yet in such a way that they provide consistent servicing to Real-Time, Best-Effort, and Scavenger class traffic. For example, on a platform that supports 11 queues with DSCP-based admission (such as a Cisco IOS router), an advanced queuing policy could be as follows:

  • Voice (≤ 18 percent)

  • Interactive-Video (≤ 15 percent)

  • Internetwork Control

  • Call-Signaling

  • Mission-Critical Data

  • Transactional Data

  • Network-Management

  • Streaming-Video Control

  • Best-Effort (≤e 25 percent)

  • Bulk Data (4 percent)

  • Scavenger (1 percent)

Figure 2-11 illustrates the interrelationship between these compatible queuing models.

Figure 11Figure 2-11 Compatible 4-Class and 11-Class Queuing Models Following Real-Time, Best-Effort, and Scavenger Class Queuing Rules

In this manner, traffic will receive compatible queuing at each node, regardless of platform capabilities—which is the overall objective of DiffServ per-hop behavior definitions.

Whenever supported, it is recommended to enable WRED (preferably DSCP-based WRED) on all TCP flows. In this manner, WRED congestion avoidance will prevent TCP global synchronization and will increase overall throughput and link efficiency. Enabling WRED on UDP flows is optional.

DoS and Worm Mitigation Principles

Whenever part of the organization's objectives is to mitigate DoS and worm attacks through Scavenger-class QoS, the following best practices apply.

First, the network administrators need to profile applications to determine what constitutes normal versus abnormal flows, within a 95 percent confidence interval. Thresholds differentiating normal and abnormal flows vary from enterprise to enterprise and from application to application. Caution must be extended not to overscrutinize traffic behavior because this could be time and resource exhaustive and easily could change from one day to the next. Remember, the presented Scavenger-class strategy will not apply a penalty to legitimate traffic flows that exceed thresholds (aside from re-marking); only sustained, abnormal streams generated simultaneously by multiple hosts (highly indicative of DoS and worm attacks) are subject to aggressive dropping, and only after legitimate traffic has been serviced.

To contain such abnormal flows, it is recommended to deploy campus Access-Edge policers to re-mark abnormal traffic to Scavenger (DSCP CS1). Additionally, whenever Catalyst 6500s with Supervisor 720s are deployed in the distribution layer, it is recommended to deploy a second line of policing defense, at the distribution layer via per-user microflow policing.

To complement these re-marking policies, it is necessary to enforce end-to-end Scavenger-class queuing policies, where flows marked as Scavenger will receive a less-than best-effort service whenever congestion occurs.

It is important to note that even when Scavenger-class QoS has been deployed end to end, this strategy only mitigates DoS and worm attacks and does not prevent them or remove them entirely. Therefore, it is critical to overlay security, firewall, intrusion detection, and identity systems, along with Cisco Guard and Cisco Security Agent solutions, on top of the QoS-enabled network infrastructure.

Deployment Principles

After the QoS designs have been finalized, it is vital that the networking team thoroughly understand the QoS features and syntax before enabling features on production networks. Such knowledge is critical for both deployment and troubleshooting QoS-related issues.

Furthermore, it is a general best practice to schedule proof-of-concept (PoC) tests to verify that the hardware and software platforms in production support the required QoS features in combination with all the other features that they currently are running. Remember, in theory, theory and practice are the same. In other words, there is no substitute for testing.

When testing has validated the designs, it is recommended to schedule network downtime to deploy QoS features. Although QoS is required end to end, it does not have to be deployed end to end at a single instance. A pilot network segment can be selected for an initial deployment, and, pending observation, the deployment can be expanded in stages to encompass the entire enterprise. A rollback strategy always is recommended, to address unexpected issues that arise from the QoS deployment.