(PP25): Holistic Approach to Power Management in HPC
AI/Machine Learning/Deep Learning
HPC Centre Planning and Operations
System Software & Runtime Systems
TimeWednesday, June 19th3:15pm - 4pm
DescriptionUpcoming High Performance Computing (HPC) systems are on the critical path towards delivering the highest level of performance for large scale applications. As supercomputers become larger in the drive to the next levels of performance, energy efficiency has emerged as one of the foremost design goals. New approaches for energy optimization are being explored which optimize throughout the whole HPC stack - from firmware and hardware through to the OS, application run-times and workload managers. The challenge of optimizing for energy efficiency requires an orchestrated approach across different components of the infrastructure. We present our approach to energy and power management, which can be described as Energy Aware Scheduling (EAS). EAS uses performance and power consumption models and software hardware co-design for implementing various energy/power aware scheduling policies at the node, job and cluster levels.