(RP17) Cross-Architecture Affinity of Supercomputers
System Software & Runtime Systems
TimeTuesday, June 18th8:30am - 10am
DescriptionTo reach exascale computing, vendors are developing complex systems that may include many-core technologies, hybrid machines with throughput-optimized cores and latency-optimized cores, and multiple levels of memory. The complexity involved in extracting the performance benefits these systems offer challenges the productivity of computational scientists greatly. A significant part of this challenge involves mapping parallel applications efficiently to the underlying hardware. A poor mapping may result in dramatic performance loss. Furthermore, an application mapping is often machine-dependent breaking portability to favor performance.
In this poster, I demonstrate that the memory hierarchy of a system is key to achieve performance-portable mappings of parallel applications. I leverage a memory-driven algorithm called mpibind that employs a simple interface computational scientists can use to map applications and results in a full mapping of MPI tasks, threads, and GPU kernels to hardware processing units and memory domains. The interface is simple and portable: a hybrid application and a number of tasks. This work applies the concept of memory-driven mappings to three emerging system architectures: a latency-optimized cores system, a two-level memory system, and a hybrid CPU+GPU system. Three use cases demonstrate the benefits of this approach: reducing system noise, improving application performance, and improving productivity. For example, an application developer can choose an advanced (and better performing) two-level memory configuration, use the same interface of a single-level memory, and still benefit from the performance advantages of the former.