(WHPC05) A Date with Data: How Time Series Information from Sensors & Logs is Revolutionizing HPC Data Center Operations at NERSC
TimeTuesday, June 18th12:30pm - 5pm
DescriptionThe National Energy Research Scientific Computing (NERSC) Center at the Lawrence Berkeley National Laboratory is home to two current Top500 high-performance computing systems, Cori (Cray XC40) and Edison (Cray XC30). The building, Shyh Wang Hall, and its data center are also home to the OMNI data collection project, which collects data from a variety of systems and sensors throughout the center, including building management systems, power infrastructure, temperature sensors, weather stations, system metrics and logs, seismometers, network routers, filesystems, particle counters, and more. The OMNI data collect contains over 500 billion records, totaling 117TB of data, and ingests new data at a rate of over 20,000 data points per second from these distributed, heterogeneous sources. In doing so, it provides a centralized location for the data that can be monitored, analyzed, and visualized with ease. By keeping more than two years worth of data online, researchers are able to ask new questions that were not possible before as well as revisit historical data for analyses if a future issue is discovered. In this poster, we explain the architecture of the OMNI data collection system as well as present four uses cases that demonstrate how the dataset led to valuable insights at NERSC. Key innovations from OMNI insights include increased energy-efficiency of the facility, critical preventative maintenance by adding a new tower water pump, and a cost savings of $2.5 million by determining that an additional mechanical substation was not necessary to support the new Perlmutter system.
Poster Authors
Environmental Systems, CSE, Operations Technology Group
Computer Scientist, Operations Technology Group
Principal Scientific Engineering Associate