PerfMemPlus: A Tool for Automatic Discovery of Memory Performance Problems
Performance Analysis and Optimization
TimeTuesday, June 18th1:45pm - 2:15pm CEST
LocationAnalog 1, 2
DescriptionIn high-performance computing many performance problems are caused by the memory system. Because such performance bugs are hard to identify, analysis tools play an important role in performance optimization.
Today’s processors offer feature-rich performance monitoring units with support for instruction sampling but existing tools only partially use this data.
Previously, performance counters were used to measure the memory bandwidth.
But the attribution of high bandwidth to source code has been difficult and imprecise.
We introduce a novel method for identifying performance degrading bandwidth usage and attributing it to specific objects and source code lines.
This paper also introduces a new method for false sharing detection.
It can differentiate false and true sharing, identify objects and source code lines where the accesses to falsely shared objects are happening.
It can uncover false sharing, which has been overlooked by previous tools.
PerfMemPlus automatically reports those issues by using instruction sampling data captured with a single profiling run.
This simplifies the tedious search for the location of performance problems in complex code.
The tool design is simple, provides support for many existing and upcoming processors and the recorded data can be easily used in future research.
We show that PerfMemPlus can automatically report performance problems without producing false positives.
Additionally, we present case studies that show how PerfMemPlus can pinpoint memory performance problems in the PARSEC benchmarks and machine learning applications.