MDTree

Documentation: https://redesignscience.github.io/MDTree/

Introduction

The MDTree is a container for branching simulation. It will enable easy retrieval and deposit of simulation that are restarted along another simulation.

This package is experimental and the APIs are subject to change.

Class Hierarchy

The MDTree contains 3 main classes:

  1. mdtree class that contains for multiple trajectories

  2. trajectory class that contains a single trajectory

  3. topology class that contains the common shared topology object (parmed) for the simulation

There is no requirement for each trajectory to come from the same simulation method (different integrator etc.). The simulation type should be annotated in the metadata for the trajectory.

Data Storage

The trajectory class offers 4 types of data storage methods for the bulky trajectory data:

  1. In memory, through a modified mdtraj.trajectory class

  2. On disk, exists as an absolute path on the disk as h5 format

  3. AWS s3, exists as a link to the s3 container as h5 format

  4. Orion File, exists as a unique file ID identifier

The trajectory class offers methods to transition between all 4 different storages.

Tree merging and splitting

The MDTree also offers functions to attach another mdtree object to a master tree, as well as detaching a branch and generating a new tree.

Uniqueness of trajectories

We will use a hash function to generate a unique identifier for any or all snapshots, in 3 different use cases:

  1. First frame hash (Used to check if the trajectory is a continuation of their parent branch)

  2. Middle frame hash (Used on parent trajectory to compare)

  3. Trajectory hash (Used to compare between trajectories)

Internal Data Structure

Internally, the Tree structure will be stored as one nested dictionary

Key: trajectory hash
Value:  Name: Non unique stage name (Equilibration, NPT_production etc.)
        Path: Local File Path
        OrionID: Orion File ID
        S3: S3 url
        Chidren: dict(hash, frame)
        Parent_Hash: hash
        Parent_frame: frame interger

API will be provided to retrieve trajectory from either hash value or parent/child relationship

Portability of MDTree

The internal data structure can be exported and imported as JSON file, and one can chose to output humanly readable figures based on the tree structure

Dependencies

  • MDtraj

  • Parmed

  • Openeye-orionplatform

  • Minio

Indices and tables