Explicit Congestion Notification (ECN) is an extension to the Internet Protocol’s (IP) RFC 7567 and is defined in RFC 3168. RFC 7567 recommends the use of Active Queue Management (AQM) to use algorithms such as Random Early Detection (RED) or Weighted Random Early Detection (WRED) to identify traffic better suited for loss to avoid costly retransmission of important traffic. ECN extends AQM to “mark” traffic instead of dropping traffic to allow for end to end notification and back-off control to avoid congestion. NVMe over Fabrics with RoCE uses ECN so that congestion can be reported in a feedback loop, thereby significantly reducing packet loss.
When ECN is properly configured the initiator sending traffic marks the packets as an “ECN Capable Transport” using the two least significant bits in the Differentiated Services (DiffServ) field of the IP header. (Defined above in traffic marking) When congestion is encountered, as identified by the AQM algorithms, the congestion point (CP) changes the two ECN bits in the DiffServ to indicate congestion encountered (CE). The packet continues its natural progression to its end point which strips off the CE and generates a Congestion Notification Packet (CNP). This makes the end point the notification point (NP). A CNP is a specialized packet that traverses the network in the opposite direction returning to the traffic streams origin to instruct it to slow down. This makes the originator of the traffic the reaction point (RP) as it reacts to the CNPs by slowing itself down for a period of time.
As ECN is a Layer 3 mechanism, not all traffic in a traffic class may support ECN or be configured, so congestion can still occur. Therefore, for best Lossless configuration, ECN should be considered a compliment to PFC because ECN provides targeted notification, and because of this, it does not have the same issues as PFC related to “Deficient Neighbor” and “Congestion Sprawl”. PFC acts as a catchall flow control when ECN isn’t configured on all traffic flows of any given port. ECN operates end-to-end and requires configuration on: