Phd Student, UC Berkeley
Revisiting Network Support for RDMA
With the advent of RoCE (RDMA over Converged Ethernet), recent years have witnessed a significant increase in the usage of RDMA in Ethernet-based datacenter networks. RoCE NICs only achieve good performance when run over a lossless network, which is obtained by enabling Ethernet’s Priority Flow Control (PFC) mechanism. However, PFC introduces significant problems, such as head-of-the-line blocking, congestion spreading, and occasional deadlocks. In this work, we ask: is PFC fundamentally required for deploying RDMA over Ethernet, or is their use merely an artifact of the current RoCE NIC design? We find that while PFC is indeed needed for current RoCE NICs, it is unnecessary (and sometimes significantly harmful) when one updates RoCE NICs to a more appropriate (yet still feasible) design. We, thus, propose a new improved RoCE NIC (IRN) design, that does not require PFC and performs better than current RoCE NICs. Our results hold across different experimental scenarios and across different congestion control algorithms. Therefore, our results suggest that to avoid the many problems with PFC, we should adopt this new IRN design for running RDMA in datacenters.
Radhika Mittal is a Phd candidate in the Computer Science Department at UC Berkeley, where she is advised by Prof. Sylvia Ratnasamy and Prof. Scott Shenker. She is interested in computer systems and networking. Her research focusses on designing schemes to meet various network-wide performance objectives while eliminating the need for specialized network infrastructure. She was awarded the Google PhD Fellowship in 2017 and the Microsoft Research Graduate Women’s Scholarship in 2013. Before starting at UC Berkeley in 2012, she received her bachelor degree in Computer Science and Engineering from IIT Kharagpur in India, where she won the Institute Silver Medal for being the top graduating student in her department.