Research Directions of UItra Ethernet Consortium
The UItra Ethernet Consortium is committed to improving Ethernet technology from the physical layer, link layer, transport layer, and software layer. On the premise of being compatible with the current Ethernet ecosystem, it improves the forwarding performance of Ethernet and is committed to improving Ethernet communication protocols and application program interface. It also improves storage, management, security structure and telemetry capabilities, so that UItra Ethernet technology can meet the network needs of artificial intelligence and high-performance computing.
The Ultra Ethernet Consortium has identified the network type that needs to be focused on as Type2 Network (Back End Network) and is not opposed to its use in Type1 Network (Front End Network), but it will not reduce the network performance of Type2 because it needs to adapt to Type1.
Type1 and Type2 Networks
UEC determines performance metrics for each network type
UEC Working Groups
UEC initially established four working groups, namely physical layer, link layer, transport layer, and software layer working groups, which achieved outstanding results. Recently, storage, management, compatibility &testing, performance & debugging working groups have been established and have just started work. The figure below shows the working groups of UEC:
UEC’s four working groups
Physical layer working group
The physical layer working group is committed to improving physical performance, reducing latency, and improving management of Ethernet physical infrastructure. It includes the development of Ethernet physical layer specifications, electrical and optical signal characteristics, application interfaces, and data structures. Its goal is to make the foundation stronger and ensure that Ethernet can meet the rigorous requirements of Al and HPC. The current physical layer working group is committed to the formulation of PHY specifications for 100G/Lane and 200G/Lane and has determined the 100G/Lane media type and PHY supported rate and type. The specifications for 200G/Lane will be determined after IEEE P802.3djis approved.
The physical layer working group has introduced several new concepts for link quality prediction: UCR (uncorrectable codeword ratio), MTBPE (the mean time between PHY errors), and MTTFPA (the mean time to false packet acceptance), dedicated to predicting and measuring physical layer link quality more accurately.
Link layer working group
The link layer working group is committed to improving the reliability and efficiency of link layer transmission and improving link layer telemetry capabilities.
The main research directions of the link layer are:
Link Layer Reliability:
Add an LLR sublayer to the link layer, located between the LLC and MAC CONTROL sublayers, for end-to-end error packet retransmission at the link layer.
Credit-based Flow Control:
Supports an end-to-end credit-based flow control mechanism at link layer to manage the lossless transmission of frames between links. The CBFC (Credit-Based Flow Control) mechanism is used to replace PFC flow control. The receiver periodically sends buffer space to the peer, and the sender sends messages based on message priority and buffer size. Buffer space can also be used for adaptive routing selection.
Credit-based flow control
Packet rate improvement:
It is committed to the compression of Ethernet message headers to increase frame transmission efficiency. During the long-term evolution of Ethernet, message headers have continued to expand, resulting in relatively low transmission efficiency. Many fields are not used in intelligent computing networks. Therefore, it is imperative to compress message headers and improve frame transmission efficiency.
There needs to be a flag in message header to indicate whether message is compressed or uncompressed for compressed message and uncompressed message to coexist in the network. The sender can choose whether to compress message without affecting the original function.
Currently, there are multiple solutions for message header compression, which are under discussion.
Negotiation:
It establishes a negotiation method for link layer parameters and characteristics. Several new capabilities at the link layer, such as LLR, CBFC, and PRI, require negotiation to support them. The main idea is to extend LLDP and add a UEC OUI for negotiation of new link layer capabilities between devices.
Transport layer working group
The UET (UEC transport layer) working group is committed to the most challenging application expansion, reliable message transmission, secure data transmission, and avoiding congestion in the network. Its goal is to solve the shortcomings of RoCE transmission and provide efficient, reliable, and secure large-scale transmission. The target transport endpoint reaches 256,000 and the number of supported processes reaches 100,000,000.
The main modules of UET are shown in the figure below:
Main modules of UET
UET contains three modules: Packet Delivery, Security, and Semantics. The functions of each module are as follows:
- Packet Delivery sublayer (PDS):
PDS contains two modules: reliability and congestion management.
The reliability module needs to cover three key requirements:
- Extreme scalability
- Orderly message transmission
- Unordered message transmission
The reliability module is designed with four message transmission modes and each mode is used for a specific purpose to meet HPC, Al, ML and other application scenarios. The four message transmission modes are:
Reliable, ordered delivery (ROD):
This mode transmits messages in order and is used for applications that require orderly transmission of messages.
Reliable, unordered delivery for operations (RUD):
This mode can only transmit messages to the semantic layer once but can tolerate unordered delivery in the network. The reliable transport layer needs to detect duplicate messages to ensure that each message can only be transmitted to the semantic layer once.
Reliable, unordered delivery for idempotent operations (RUDI):
This mode is optimized for read and write operations of RDMA.
Unreliable, unordered delivery (UUD):
Unreliable messages can carry many new semantics of UET. Users of UDD do not need reliable transmission and use other reliability methods.
The congestion management module is still under study, including congestion management and load balancing, and can perform congestion management based on each FEP. The core is flow control based on the receiver’s credit. Congestion control defines the window size and injection rate. The goal is to reduce the rate and limit messages to avoid congestion at intermediate nodes and endpoints. Path load balancing defines which path a specific message chooses, and ECMP can be used to select the path.
- Transport Security:
Transport Security is a top priority in UET design, with optional encryption and authentication of all data payloads and most transmission headers.
- Semantics:
The UET semantic layer provides high-performance and highly scalable operations,enabling specialized Al and full-featured HPC deployment.
The semantic layer is the bridge between user software and PDS (message Delivery Layer). The semantic layer defines a series of
operations, such as sending, receiving, writing, reading, etc. The layer provides optional sorting, including various optional initiators and target completion notification capabilities.
The semantic layer provides connectionless calling API and must natively support *CCL, MPI, OpenSHMEM and other APIs.
Software layer Working Group
The software layer promotes rapid adoption of UEC by using the libfabric API as the data plane framework through compatibility with various currently widely adopted communication libraries such as *CCL, MPI and SHMEM. It defines the interaction between various accelerators and FEP, including related accelerator APIs. It defines control plane and data plane mechanisms for switches, FEPs, and Aggregation Managers (AMs)to allow interoperability between different UEC vendors. It addresses the need for UEC to support multiple workload profiles.
Software layer working group
The work that the software layer needs to do for INC includes:
- Define an APl (using C language)using INC’s collection communication (libfabric).
- Define a discovery mechanism to confirm available INC offload capabilities.
- Define the RPC interface these libraries use to communicate with the Aggregation Manager (AM). Specify the RPC interface used for communication between the AM and the UEC switch providing INC resources.
- OpenConfig extension for configuring the FEP of network devices (configured by AM) for collective communication offloading and monitoring for performance and errors.
- Behavior of INC-compliant network devices with multiple feature profiles.Guide the development of UEC transmission protocols so that INC technology can be easily applied to hardware implementation.
Related Products:
- QSFP-DD-400G-SR8 400G QSFP-DD SR8 PAM4 850nm 100m MTP/MPO OM3 FEC Optical Transceiver Module $180.00
- QSFP-DD-400G-DR4 400G QSFP-DD DR4 PAM4 1310nm 500m MTP/MPO SMF FEC Optical Transceiver Module $450.00
- QSFP-DD-400G-SR4.2 400Gb/s QSFP-DD SR4 BiDi PAM4 850nm/910nm 100m/150m OM4/OM5 MMF MPO-12 FEC Optical Transceiver Module $1000.00
- QSFP-DD-400G-FR4 400G QSFP-DD FR4 PAM4 CWDM4 2km LC SMF FEC Optical Transceiver Module $600.00
- QSFP-DD-400G-SR4 QSFP-DD 400G SR4 PAM4 850nm 100m MTP/MPO-12 OM4 FEC Optical Transceiver Module $600.00
- QSFP-DD-400G-XDR4 400G QSFP-DD XDR4 PAM4 1310nm 2km MTP/MPO-12 SMF FEC Optical Transceiver Module $650.00
- QSFP-DD-800G-DR8D QSFP-DD 8x100G DR PAM4 1310nm 500m DOM Dual MPO-12 SMF Optical Transceiver Module $1500.00
- QSFP-DD-800G-SR8 800G SR8 QSFP-DD 850nm 100m OM4 MMF MPO-16 Optical Transceiver Module $1200.00
- QSFP-DD-800G-DR8 800G-DR8 QSFP-DD PAM4 1310nm 500m DOM MTP/MPO-16 SMF Optical Transceiver Module $1300.00
- QSFP-DD-800G-2FR4 800G QSFP-DD 2FR4 PAM4 1310nm 2km DOM Dual CS SMF Optical Transceiver Module $3500.00
- QSFP-DD-800G-2FR4L QSFP-DD 2x400G FR4 PAM4 CWDM4 2km DOM Dual duplex LC SMF Optical Transceiver Module $4000.00
- QSFP-DD-800G-FR8L QSFP-DD 800G FR8 PAM4 CWDM8 2km DOM Duplex LC SMF Optical Transceiver Module $5000.00
- OSFP-800G-DR8D-FLT 800G-DR8 OSFP Flat Top PAM4 1310nm 500m DOM Dual MTP/MPO-12 SMF Optical Transceiver Module $1200.00
- OSFP-800G-SR8D-FLT OSFP 8x100G SR8 Flat Top PAM4 850nm 100m DOM Dual MPO-12 MMF Optical Transceiver Module $850.00
- OSFP-800G-SR8 OSFP 8x100G SR8 PAM4 850nm MTP/MPO-16 100m OM4 MMF FEC Optical Transceiver Module $750.00
- OSFP-800G-SR8D OSFP 8x100G SR8 PAM4 850nm 100m DOM Dual MPO-12 MMF Optical Transceiver Module $750.00
- OSFP-800G-DR8 OSFP 8x100G DR PAM4 1310nm MPO-16 500m SMF DDM Optical Transceiver Module $1100.00
- OSFP-800G-DR8D 800G-DR8 OSFP PAM4 1310nm 500m DOM Dual MTP/MPO-12 SMF Optical Transceiver Module $1100.00