(RP26) Kronos Development and Results - HPC Benchmarking with Realistic Workloads
AI/Machine Learning/Deep Learning
Performance Analysis and Optimization
TimeTuesday, June 18th8:30am - 10am
LocationSubstanz 1, 2
DescriptionHPC benchmarking is traditionally performed by testing HPC subsystems in isolation and then combining the results to estimate full-system performance. Despite being widely adopted, this approach does not highlight potential bottlenecks under real-life workloads when many jobs contend the shared resources of the system (e.g. network, storage, etc...).
To address this issue, the European Centre for Medium-Range Weather Forecasts ECMWF is developing a software called Kronos that aims to benchmark HPC systems through realistic workloads, considering the HPC system as a whole. This approach comprises a modelling phase, where a workload model is generated from real-life job profiles and an execution phase, where the workload model is executed on a target machine through a set of light-weight and portable “synthetic” applications.
The most recent Kronos development involves the generation of a workload model made of a combination of “synthetic” and real applications. This allows retaining the most relevant applications in the benchmarking workload and only ancillary applications are substituted by their synthetic representation.
Kronos is being developed as part of the NEXTGenIO project which is a 4-year EU-funded Horizon 2020 project started in October 2015. NEXTGenIO is coordinated by the Edinburgh Supercomputing Centre (EPCC) and involves partners from several European countries. It aims to develop innovative solutions for I/O bottlenecks as high-performance approaches Exascale.
ECMWF is currently using Kronos in the procurement of their next HPC system that is due to become operational by 2020.