(RP06) Reduction Operations on Modern Supercomputers: Challenges and Solutions
TimeTuesday, June 18th8:30am - 10am
DescriptionMessage Passing Interface (MPI) is the dominant parallel programming model offering various primitives like point-to-point and collectives. MPI Allreduce is a very popular collective used in the scientific/DL applications. While scientific applications typically use small/medium messages, DL applications need large message reductions. On the other hand, advances in processor and interconnect technologies brings about novel techniques that have the potential to improve the performance. Here the broad challenge is how do we design high-performance reduction collectives that take advantage of the trends in modern processor architecture to deliver good performance for reduction for various message sizes. In this work, we take up this challenge and use various optimizations like network offload mechanisms, efficient pipelining, and zero-copy intra-node communication to propose three designs each targeting three ranges of message sizes. The evaluation of the proposed designs shows significant performance on a wide variety of microbenchmarks and scientific and DL applications.