What is QoS?

QoS is a mechanism that gives us the ability to treat packets differently as they transit a network device, based on the packet contents. Basically means that the network traffic for a voice call/critical application can be given priority over a movie download or large file transfer as it passes through the network.

Please consider below things when you planning to implement QoS in your Network.

Strategic QoS Design

QoS Mechanism/Tools:

  • Classification & Marking
  • Congestion Management
  • Congestion Avoidance
  • Policing
  • Shaping

Classification and Marking  :

Classification is a way to identify and split the traffic into different classes. Depending on the traffic type or class we can apply the policies. Please keep in mind that classification can be done without marking.

Marking is also known as coloring where we write a value into the packet header. A “class” of traffic will receive the same type of QoS treatment. Basically means if we want to treat packets different then we have to identify and mark them.

Classification can be done on different layer like layer 2, layer 3, layer 4.

We can be classifying traffic :

L2 and L3 Classification: Ethernet frames contain no distinctive “priority” field unless carried on 802.1q or ISL trunks mean Layer 2 marking is used on Trunk port only.

CoS is layer 2 marking that is used on Ethernet.

ToS is a Layer 3 marking method. Layer 3 Marking has two ways to mark the packet.

  1. IP Precedence: The three most significant bits of ToS field are called IP precedence and others bits are unused.
  2. DSCP (Differentiated Service Code Point): Six most significant bits of ToS field are call DSCP and the remaining two bits are used for flow control. Please find the below reference.



Commands for Configuration (Classification and Marking)


Congestion management:

The congestion management feature allows us to control congestion, basically provide us some kind of control over the order of transmission.

Congestion management involves the creation of queues, assignment of packets to those queues based on the classification of the packet, and Scheduling packets out of the various queues and preparing them for transmission.

The two most important parameters associated with the queuing and scheduling mechanism are buffers and bandwidth. Buffering is the length of the queue, that is, how much memory is available to store packets. A total amount of bandwidth is made available to the queuing and scheduling mechanism.

Congestion management function:

  • Create queues
  • Assign packets to those queues based on packet classification
  • Scheduling packets in a queue for transmission

What is the queue?

Basically queue is a memory structure to hold incoming (prior to forwarding lookup) and outgoing (after lookup) packets. Each physical port has an input queue and an output queue.

Input queue: We have one queue per interface and that is always FIFO.

Output Queue: we have a software queue and a hardware queue per interface.

1. Hardware Queue: Always uses FIFO, Hardware queue is known as tx-ring. It can be manipulated using the “tx-ring-limit X” command. It can’t be affected by any IOS queuing mechanism.

 2. Software Queue: Can be FIFO or any Fancy Queuing mechanism. It can be manipulated using the “hold-queue XX in|out” command. Queuing tools can create and manage a software queue.

How to check queue size?

show interface <Id>

Hardware Queue is not congested, then any packets that arrive at the interface will bypass the software queuing process and be sent directly to the hardware queue to be sent out the physical interface. Tunning of queues is not recommended by CISCO.

Why do we need congestion management?

As we know most of the time we have FIFO queuing by default means no control over the order of transmission. Basically Congestion management is giving us some kind of control over the order of operation. Congestion management uses the marking on each packet to determine in which queue to place packets.

What is the reason of Congestion?

Congestion happens when input traffic demand exceeds the capacity of the network.


  1.  Speed mismatched (traffic moving from LAN to WAN)
  2.  Traffic aggregation ( Traffic coming to one location from multiple locations)
  3. Insufficient packet buffers to handle the traffic.

Egress Congestion: Packets forwarded to egress interface faster than Tx-Ring can handle them.

Ingress Congestion:  Packets arrive on multiple ingress interfaces faster than forwarding engine can process them. Rare case.


First In First Out (FIFO):

  • FIFO is a complete Fair Queuing method (all packets are treated equally) means Packets are forwarded in the same order in which they arrive at the interface.
  • No Delay and bandwidth Guarantee
  • FIFO is default queuing mechanism in the hardware queue.
  • Used in individual queues of software queues.
  • There is only one queue for all the packets.


  • Priority queuing is a technique where we have 4 Queue (High, Medium, Normal, Low).
  • Priority queuing schedules traffic in such a way so that high-priority queues always get serviced first.) If there is no packet in the High queue, the scheduler will look to service the Medium queue.It will take one packet from the Medium queue, and then again look for any packets waiting in the High queue. The Low queue only gets serviced if there are no packets waiting in High, Medium and Normal queues.
  • No Bandwidth guarantee but it gives us delay guarantee for one queue(High)


  • With WFQ the Traffic is sorted into flows and once flows are identified, WFQ will dynamically create a queue for each flow inside the Software queue means basically WFQ is providing dedicated queue for each flow and allocating bandwidth fairly.
  • The number of queues depends on number of flow. By default 256 the number of queues can created dynamically be depending on flows and the number of queues can be configured in between 16-4096.
  • If a number of flow exceed the number of queues means overlapping between queues and flows.
  • Weighted Fair queuing is enabled by default: On physical interfaces whose bandwidth is less than or equal to 2.048 Mbps or On interfaces configured for Multilink PPP.


  • In WFQ we were not having control over the classification mechanism. CBWFQ allows the creation of user-defined classes and each queue receives a user-defined minimum bandwidth guarantee but it can use more bandwidth if bandwidth is available.
  • Provides flow-based WFQ support for non-user defined traffic classes.
  • Can create up to 64 queues, one of each user-defined class.

____________________________ADD from _______________________

** Each queue is provided with configurable minimum bandwidth guarantee using three ways:

1. Fixed bandwidth using bandwidth command like “bandwidth xxx (kbps)”

2. Percentage of total interface bandwidth using  “bandwidth percent”.

3. Percentage of remaining unallocated bandwidth using “bandwidth remaining percent” command.

Note: we can use one only way in a single policy map.


  • The Low Latency Queuing feature brings strict priority queuing to Class-Based Weighted Fair Queuing (CBWFQ). Basically combined benefits/features of PQ and CBWFQ. we can say LLQ allows us to convert one or more CBWFQ into priority queue using priority command.
  • The benefit of LLQ over CBWFQ is the existence of one or more strict-priority queues with bandwidth guarantees for delay- and jitter-sensitive traffic.
  • The advantage of LLQ over the traditional PQ is that the LLQ strict-priority queue is policed. That eliminates the chance of starvation of other queues, which can happen if PQ is used.


Congestion Avoidance:

Congestion Avoidance is a mechanism/way which is used to drop packets early to avoid congestion later in the network. Basically Congestion Avoidance is also known as a skilled way of dropping packets.

When a queue is full, there is no room for any more packets then the device will drop all packets. This is called the tail drop. Tail drop treats all traffic equally and does not differentiate between classes of service.

RED[Random Early Detection]: RED is used to prevent TCP synchronization. RED uses three configurable parameters to determine the drop probability of packets: minimum threshold, maximum threshold, and mark probability denominator.

  • Minimum Threshold: The queue depth at which RED begins dropping packets.
  • Maximum Threshold: Queue depth size after that RED will start tail drop.
  • Mark Probability denominator: Determines the number of packets that will be dropped, when the size of the queue is in between the minimum and maximum thresholds. This is measured as a fraction, specifically 1/MPD. For example, if the MPD is set to 10, one out of every 1 packet will be dropped.

WRED [Weighted random early detection]:  WRED) is another mechanism for controlling congestion of Layer 3 queues. WRED combines the capabilities of the random early detection (RED) mechanism with IP precedence, differential services code point (DSCP), and discard-class to provide preferential handling of higher priority packets.

It’s recommended to enable DSCP-based WRED on the AF queue and don’t enable DSCP based WRED on EF queue.


  • It can drop or remark excess traffic.
  • It can be applied in the inbound or outbound direction.


  • Attempts to delay excess traffic rather than dropping it.
  • It can only be applied in the outbound direction.


QoS Configuration




Leave a Reply

Your email address will not be published. Required fields are marked *