In Situ Data Analytics for Next Generation Molecular Dynamics Workflows
Event Type
Focus Session
Big Data Analytics
Molecular Research
Scientific Software Development
TimeWednesday, June 19th11am - 11:30am CEST
LocationPanorama 2
DescriptionMolecular dynamics (MD) simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology, and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have dramatically higher performance than do current systems, generating more data that needs to be analyzed (i.e., in terms of number and length of MD trajectories). The coordination of data generation and analysis cannot rely on manual, centralized approaches as it is predominately done today.

In this talk we discuss how the combination of machine learning and data analytics algorithms, workflow management methods, and high performance computing systems can transition the runtime analysis of larger and larger MD trajectories towards the exascale era. We demonstrate our approach on three case studies: protein-ligand docking simulations, protein folding simulations, and analytics of protein functions depending on proteins’ three-dimensional structures. We show how, by mapping individual substructures to metadata, frame by frame at runtime, we can study the conformational dynamics of proteins in situ. The ensemble of metadata can be used for automatic, strategic analysis and steering of MD simulations within a trajectory or across trajectories, without manually identify those portions of trajectories in which rare events take place or critical conformational features are embedded.