Performance Optimization of Scientific Codes with the Roofline Model
Performance Analysis and Optimization
TimeSunday, June 16th2pm - 6pm
DescriptionThe Roofline performance model offers an insightful and intuitive way to identify performance bottlenecks and guide optimization efforts, and it has been increasingly popular in the HPC community. This tutorial will strengthen the community’s Roofline knowledge and empower the community with a more automated and systematic methodology for Roofline-based analysis on both CPU and GPU architectures. It will start with an overview of the Roofline concepts and then focus on NVIDIA GPUs and present a practical methodology for Roofline data collection. With some examples, it will discuss how various characteristics such as arithmetic intensity, memory access pattern and thread divergence can be captured by the Roofline formularism on GPUs. The tutorial will then shift its focus to Intel CPUs and proceed with a hands-on, where Intel Advisor and its Roofline feature are introduced and a stencil code is used to demonstrate how Roofline can be used to guide optimization on Haswell and KNL architectures. The tutorial will conclude with a set of case studies illustrating effective usage of Roofline in real-life applications. Overall, this tutorial is a unique and novel combination of a solid methodology basis, highly practice-oriented demos and hands-on, and a representative set of open-science optimization use cases.
Content Level 25% Introductory: Roofline methodology;
50% Intermediate: Roofline automation and hands-on;
25% Advanced: Analysis of cache, vectorization, and thread divergence effects
Target AudienceAny users, developers, vendors, and facilities in HPC who have an interest in performance characterization and performance optimization.
PrerequisitesAttendees should bring their own laptop in order to participate in the hands-on session, which takes up about 40% of the tutorial.
No remote server account is needed since for the hands-on session, a local licence of Advisor will be provided.
Attendees are recommended to equip themselves with basic knowledge of modern CPU and GPU architectures beforehand in order to make the most out of attending the theoretical talks.