RDMA

Salem, F., F. Schintke, T. Schütt, and A. Reinefeld, "Scheduling Data Streams for Low Latency and High Throughput on a Cray XC40 Using Libfabric", Concurrency and Computation Practice and Experience, pp. 1-14, 2019. Abstract

Achieving efficient many-to-many communication on a given network topology is a challenging task when many data streams from different sources have to be scattered concurrently to many destinations with low variance in arrival times. In such scenarios, it is critical to saturate but not to congest the bisectional bandwidth of the network topology in order to achieve a good aggregate throughput. When there are many concurrent point-to-point connections, the communication pattern needs to be dynamically scheduled in a fine-grained manner to avoid network congestion (links, switches), overload in the node’s incoming links, and receive buffer overflow. Motivated by the use case of the Compressed Baryonic Matter experiment (CBM), we study the performance and variance of such communication patterns on a Cray XC40 with different routing schemes and scheduling approaches. We present a distributed Data Flow Scheduler (DFS) that reduces the variance of arrival times from all sources at least 30 times and increases the achieved aggregate bandwidth by up to 50 %.

Salem, F., F. Schintke, T. Schütt, and A. Reinefeld, "Scheduling data streams for low latency and high throughput on a Cray XC40 using Libfabric", CUG Conference Proceedings, Canada, 2019. Abstract

Achieving efficient many-to-many communication on a given network topology is a challenging task when many data streams from different sources have to be scattered concurrently to many destinations with low variance in arrival times. In such scenarios, it is critical to saturate but not to congest the bisectional bandwidth of the network topology in order to achieve a good aggregate throughput. When there are many concurrent point-to-point connections, the communication pattern needs to be dynamically scheduled in a fine-grained manner to avoid network congestion (links, switches), overload in the node’s incoming links, and receive buffer overflow. Motivated by the use case of the Compressed Baryonic Matter experiment (CBM), we study the performance and variance of such communication patterns on a Cray XC40 with different routing schemes and scheduling approaches. We present a distributed Data Flow Scheduler (DFS) that reduces the variance of arrival times from all sources at least 30 times and increases the achieved aggregate bandwidth by up to 50%.