(PhD10) Smoothing Data Movement Between RAM and Storage for Reverse Time Migration
TimeMonday, June 17th1pm - 6pm
DescriptionDue to Moore's law, the performance of CPUs has been historically doubling every two years. However, storage systems have not been able to catch up in the same speed. I/O is considered a bottleneck for many critical simulations that rely on writing and reading intermediate results or snapshots during their computation. Reverse Time Migration (RTM), for example, is a seismic imaging method that compute high resolution image from seismic data, by propagating waves in 3D models. Based on an adjoint-state formulation, RTM requires combining at regular time steps a forward-propagated source wavefield with a back-propagated receiver wavefield. This process involves thus a first phase where the 3D simulation states, or snapshots, of the source wavefield are computed and saved at predetermined time steps. Then in a second phase, the receiver wavefield is computed and the imaging time steps, the source snapshots are retrieved and correlated with the receiver snapshots to update the final image. The time that is spent blocking for I/O during the two phases amount for about %68 of the entire execution time.
Recent supercomputer architectures include several layers of storage from DRAM to BurstBuffer to disk, which triggered the development of libraries implementing the checkpoint restart pattern, mainly for fault tolerance purpose.
Utilization of these libraries for RTM, can improve the overall performances, mainly through asynchronous I/O accesses, which allow for a partial overlap with computations. However, most of these generic components are not optimal as they cannot benefit from the very regular access pattern in the RTM which allow for further performance gain through customized prefetching strategies.
The system, Multilayered Buffer System (MLBS), proposed in this research, shows a general and versatile method for overlapping I/O with computation that helps to reduce the blocking time through asynchronous access and an RTM specific stage out and prefetching strategy, while minimizing the impact on the computational kernel. MLBS shows that coupling both storage and memory systems and using the knowledge of data access patterns can decrease the time spent blocking for I/O by more than %70 and thereby improving the entire execution time by 2.98X on regular storage and up to 3.95X on BurstBuffer.