Lossless networking consists of three mechanisms:
Traffic Marking is a means of which to identify, categorize, and mark traffic types to allow QoS manipulations of traffic flows.
This mechanism is not new to the industry. One of the first forays into traffic marking was introduced by the IETF RFC 1349 which defined the Type of Service (ToS) Octet within the IP header. This RFC has sense been replace by the IETF RFC 2474 which redefined the ToS Octet to the Differentiated Service (DiffServ) Octet. The redefinition modernized the ToS Octet and allowed for increased compatibility with protocols being standardized by the IEEE Data Center Bridging (DCB) task force.
The DiffServ Octet provides for two Lossless mechanisms. First it allows for a 6-Bit Differentiated Services Code Point (DSCP), which provides the packets Traffic Class (Similar to previous ToS) as well as the Drop Precedence. Secondly it provides the basic functionality for Explicit Congestion Notification (ECN). (Defined Later in this Guide)
DSCP can be set and read from anywhere in the network. Generally, we recommend setting the end points to properly configure the DSCP field on pertinent traffic before putting it onto the network. DSCP is the only supported mechanism for traffic marking used by the OpenFlex devices.
But what about Priority Code Point (PCP)?
PCP is another traffic marking mechanism that was used in the early days of DCB. PCP as defined in IEEE 802.1p, a component of 802.1q, uses a 3-bit field in the VLAN tag to convey the Class of Service (CoS). This class of service is similar to the “Traffic Class” provided in DSCP or the “IP Precedence” provided by ToS. All provide 8 Network Priorities (0-7). As VLAN tags are a Layer 2 construct (Ethernet Mechanism) it makes the use of PCP on larger or more complex networks (Layer 3) difficult. Some switches will strip VLAN Tags and re-add them from packets as the enter and leave the switch. To keep network implementation as simple as possible PCP is not supported by OpenFlex devices.
Traffic Shaping is the ability to set and enforce QoS metrics such as bandwidth limitation onto specific traffic classes. When the network is underload Traffic Shaping delays some packets in favor of other packets.
The IEEE 802.1Qaz the DCB taskforce defined Enhanced Transmission Selection (ETS). ETS allows for 8 Traffic Classes set to either Strict or Bandwidth Allocation.
With Strict, Any Traffic Class set to strict is guaranteed up to 100% of the bandwidth unless another Traffic Class of higher value is set to strict.
In example below: If TC 4 is using 60% of the BW and TC 6 rises above 40% of BW, TC 4 will be reduced in favor of TC 6. With BW Allocation, if a Traffic Class is set to a bandwidth percentage it is guaranteed that percentage at a minimum.Minimum guarantees are only between BW allocated buffers. Strict always has priority over BW and can consume all bandwidth If the network is not fully utilized any given TC is allowed to exceed its guaranteed BW minimum. In example below: TC 3 is Guaranteed 50% of the bandwidth but can consume more if the network is not overly utilized.
Flow Control
Flow Control is a mechanism to temporarily stop or slow down traffic to reduce congestion (Packet Drops). Flow Control is also not an unknown concept. IEEE introduced 802.3x which defined Ethernet Flow Control (Global Pause). Global Pause is a simple mechanism that transmits a Pause frame to all transmitting neighboring ports contributing to the buffer overflow of a given port. As Global Pause is Ethernet mechanism (Layer 2) it only understands MAC address and direct neighbors. Global Pause does not rely on Traffic Marking as it simply pauses all traffic from the neighboring ports for a specified duration.
Priority Flow Control
The DCB Task Force introduced IEEE 802.1Qbb which defines Priority Flow Control (PFC) to extend Global Pause by adding a Priority or Traffic Class marking to the Pause Frame. This allows a PFC pause frame to specify an individual Traffic Class to pause while allowing other Traffic Classes to continue to flow without impedance.
PFC uses priority-based pause frames for flow control, which is an essential component for lossless Ethernet. Although PFC operates on an Ethernet link between two adjacent Ethernet devices, a meaningful PFC setup for storage traffic should take a wider look at the network and at least configure the following:
As PFC is a Layer 2 mechanism, it has two drawbacks:
Deficient Neighbor – In this concept, when there are two or more clients pulling data over the same traffic class from a target and one of the clients has substantially lower performance than the other, PFC could reduce the higher performance system down to match that of the lower performance system.
Congestion Sprawl – This concept involves a more complex network where two or more switches are being used. As PFC is a Layer 2 mechanism and only pauses traffic on neighboring ports, a switch to switch link that provides an artery for multiple streams of traffic within the same traffic class could be paused impacting the performance of third-party data streams unrelated to the congestion event. This can intern cause more congestion sending out more pause frames branching through the network.