1 of 12

FF-related objects and their converters

2 of 12

Molecule.from_file (filename)

OFF Molecule

OE Molecule

RDK Molecule

OFF Topology

SDF, MOL, SMI File; SMI string

Several file formats, SMI string

Molecule.from_rdkit

Molecule.from_openeye

Topology.from_molecules([off_mol1, off_mol2, ...])

Ligand PDB File

OMM PDBFile

OMM Topology

Topology.from_openmm(

omm_topology,

unique_molecules=[off_mol1, off_mol2])

simtk.openmm.app.PDBFile(pdb_file_path)

pdb_file_obj.topology

3 of 12

Ligand OFF Topology

OFF FF

OFFXML File

ForceField(filename)

Ligand OMM System

off_ff.create_openmm_system(off_topology)

Ligand ParmEd Structure

Ligand PDB File

OMM PDBFile

simtk.openmm.

app.PDBFile(

pdb_file_path)

OMM FF

simtk.openmm.app.ForceField(

'amber99sbildn.xml',

'tip3pfb.xml')

Protein+water PDB File

OMM PDBFile

simtk.openmm.app.PDBFile(pdb_file_path)

omm_forcefield.createSystem(

omm_pdbfile.topology,

rigidWater=False)

Protein+WaterOMM System

Protein+Water ParmEd Structure

parmed.openmm.load_topology(

omm_topology,

omm_system,

xyz=omm_pdbfile.positions

OR off_mol.conformers[0].value_in_unit(unit.angstrom))

Complex ParmEd Structure

ligand_struct + protein_struct

OMM Topology

omm_pdbfile.topology

off_topology.to_openmm()

4 of 12

Complex ParmEd Structure

OMM System

pmd_structure.createSystem(

nonbondedMethod=NoCutoff, nonbondedCutoff=9.0*unit.angstrom,

constraints=HBonds,

removeCMMotion=False)

pmd_structure.save(

'system.prmtop')

pmd_structure.save(

'system.inpcrd')

Amber PRMTOP File

Amber INPCRD File

parmed_structure.save(

'system.top')

parmed_structure.save(

‘system.gro')

Gromacs TOP File

Gromacs GRO File

5 of 12

OMM System + cutoffs and stuff

ParmEd System

AMBER System

OFF Molecule

w/o charges

w/o confs

OFF Topology

OMM Topology

+chains

XYZ file

/QCArchive mol

/PDB w/o CONECT

SMILES

- coordinates

+ bond orders

+ stereo definition

+? aromaticity perception

+aromaticity perception

Multiple copies (potentially)

+parameters

+cutoffs and globals

SDF w/o charges

/mol2 w/o charges

/PDB w CONECT

SDF w charges

/mol2 w charges

+bond orders

OFF Molecule

w/o charges

w confs

OFF Molecule

w charges

w confs

Information Content

+charges

-coords

-coords

-charges

- cutoffs and globals

6 of 12

+ bond existence

+ bond orders

+ formal charges

+ stereo definition

MDAnalysis/obabel/rdkit/xyz2mol guessers

SDF

OFFMol

Information Content

mol2

OFFPolymerMol

+ resnames

OFFMol.percieve_residues/hierarchy

Atomtyped PDB

w/o CONECT

w/ expl H

+ canonical resnames

- elements

PDBFixer

Atomtyped PDB w CONECT

w/ expl H

+ bond existence

PDBFixer

QCArchive mol w/ CMILES

QCSubmit / OFFMol.from_qcarchive

+ bond orders

+ elements

+ formal charges

+ stereo definition

OFFBioMol.from_pdb?

Element PDB w/o CONECT

w/ expl H

XYZ file

/QCArchive mol w/o CMILES

+ NONcanonical resnames

Element PDB w CONECT

w/ expl H

+ bond orders

+ formal charges

+ stereo definition

+ resnames?

OFFBioMol.from_pdb?

OFF TypedMolecule

7 of 12

2023_07 QCSubmit updating

8 of 12

BasicResultCollection(entries={"http://localhost:443": entries})

client.get_collection(

“OptimizationDataset”,

Name

)

col.data.records.values()

OptEntry.attributes["CMILES"] and InChI

ORC.to_records

OD.compute

OD.save

QCFractal client

Legacy QCSubmit (<0.50)

QCEl Mol

QCF Specification

'name': 'default',

'description': 'Geometric + rdkit',

'optimization_spec': {'program': 'geometric', 'keywords': None},

'qc_spec': {'driver': 'gradient',

'method': 'uff',

'basis': None,

'keywords': None,

'Program': 'rdkit'

}

qcportal.collections.OptimizationDataset

OD.add_entry

OD.add_specification

QCFractal server

FractalClient(QCF_serv)

qcportal.collections.OptimizationDataset

Openff.qcsubmit.factories.

OptimizationDatasetFactory

Openff.qcsubmit.datasets.

OptimizationDataset

ODF.create_dataset(..., molecules)

OFF Mols

QCS Specification

method="openff-1.0.0",

basis="smirnoff",

program="openm”,

spec_description="default openff spec", spec_name="openff-1.0.0",

OD.add_qc_spec

Openff.qcsubmit.

Constraints

OD.add_molecule(index, mol, constraints)

OD.submit

Constraint spec

constraint_type="dihedral",

indices=[2, 0, 1, 5]

, value=60, bonded=True

C.add_set constraint

C.add_freeze_constraint

Openff.qcsubmit.results.

OptimizationResultCollection

Where does CMILES switch from being on entry to being on result?

Openff.qcsubmit.results.

OptimizationResultRecord

OFF Mols

CMILES + InChI

CMILES + InChI

Qcportal.collections.optimization_dataset.

OptEntry

client.query_procedures

List[Qcportal.models.records.

ResultRecord]

List[QCEl Mol]

client.query_molecules

9 of 12

BasicResultCollection(entries={"http://localhost:443": entries})

client.get_collection(

“OptimizationDataset”,

Name

)

col.data.records.values()

OptEntry.attributes["CMILES"] and InChI

ORC.to_records

OD.compute

OD.save

QCFractal client

Next QCSubmit (>=0.50)

QCEl Mol

QCF Specification

'name': 'default',

'description': 'Geometric + rdkit',

'optimization_spec': {'program': 'geometric', 'keywords': None},

'qc_spec': {'driver': 'gradient',

'method': 'uff',

'basis': None,

'keywords': None,

'Program': 'rdkit'

}

qcportal.collections.OptimizationDataset

OD.add_entry

OD.add_specification

QCFractal server

FractalClient(QCF_serv)

qcportal.collections.OptimizationDataset

Openff.qcsubmit.factories.

OptimizationDatasetFactory

Openff.qcsubmit.datasets.

OptimizationDataset

ODF.create_dataset(..., molecules)

OFF Mols

QCS Specification

method="openff-1.0.0",

basis="smirnoff",

program="openm”,

spec_description="default openff spec", spec_name="openff-1.0.0",

OD.add_qc_spec

Openff.qcsubmit.

Constraints

OD.add_molecule(index, mol, constraints)

OD.submit

Constraint spec

constraint_type="dihedral",

indices=[2, 0, 1, 5]

, value=60, bonded=True

C.add_set constraint

C.add_freeze_constraint

Openff.qcsubmit.results.

OptimizationResultCollection

Where does CMILES switch from being on entry to being on result?

Openff.qcsubmit.results.

OptimizationResultRecord

OFF Mols

CMILES + InChI

CMILES + InChI

Qcportal.collections.optimization_dataset.

OptEntry

client.query_procedures

List[Qcportal.models.records.

ResultRecord]

List[QCEl Mol]

client.query_molecules

10 of 12

Next QCSubmit (>=0.50)

ds = client.get_collection("OptimizationDataset", dataset.dataset_name) → ds = client.get_dataset(dataset.type, dataset.dataset_name)

openff.qcsubmit.datasets.datasets.TorsiondriveDataset

11 of 12

2024_01_05 OFFMol.from_qcschema updating

12 of 12

Ok, so the previous Molecule.from_qcschema could take a few types of input:

  • `entry` could be `qcelemental.models.molecule.Molecules`, QCArchive Dataset Entries (singlepoint, optimization, torsiondrive, or gridoptimization), or a `dict` rep of any of those
  • client was optional

Notes

  • Molecule.to_qcschema creates a QCElemental Molecule. This may be how multi-molecule "molecules" are made in QCA datasets.

Questions

  • Can QCEl Molecules and QCF Entries exist without a client?
    • Yes - You can get a torsiondrive entry without geometries using client.get_records(1791723)
  • Did previous design decisions get made because CMILES would be in Molecules for some types of dataset but for others be in Entries? Will CMILES now always be on Molecules? What about legacy entries?
  • Do offline Entries now have everything an online Entry would have? Can offline Entries be mocked at all?
  • Could we move all tests offline?
  • Can a single method be compatible with both new and old QCA?
    • Would require more complex testing matrix
  • Is this functionality redundant with QCSubmit?
  • Are there changes coming to QCElemental Molecules?

There are tests for

  • `test_to_qcschema`: Molecule.to_qcschema
  • `test_to_qcschema_no_connections`: Molecule.to_qcschema for disconnected molecules (multi molecule input)
  • `test_from_qcschema_no_client`: Loading a bunch of serialized QCMols from file and ensuring they're parseable and their canonical_isomeric_smiles are valid
  • `test_from_qcschema_with_client`: Loading a bunch of real entries, running from_qcschema and ensuring they have the right number of confs and are isomorphic to mols created just from their canonical_explicit_hydrogen_smiles
  • `test_qcschema_round_trip`: Loading a single optimization entry, running from_qcschema, and doing a detailed comparison to the optimization's initial_molecule.
  • `test_qcschema_round_trip_from_to_from`: Loads a torsiondrive entry from QCA, runs from_qcschema on it, then runs to_qcschema, then from_qcschema, and makes sure the two OFFMols are equal.
  • `test_qcschema_round_trip_raise_error`: Loads a torsiondrive entry from QCA, deletes its CMILES, then fails to load it.
  • `test_qcschema_molecule_record_round_trip_from_to_from`: Loads a molecule record from QCA, runs from_qcschema on it, then to _qcschema, then from_qcschema, and then makes sure the OFFMols are identical

What's broken now?

  • Have object model changes to entries broken something?
    • I think it’s no longer possible to mock entries, since they require clients, which must be live.

Minimum viable fix:

  • Make input handling more explicit for entries vs. QCE molecule objects
  • Add fallback hierarchy to get CMILES from molecule identifiers if possible (and test on migrated entries/mols)
  • Remove client kwarg
  • Add test for singlepoint entry
  • Fix offline/serialized mol test

Future prospects