grade documentation

    Copyright © 2010-2020 Global Phasing Limited

All rights reserved.

This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.

Authors:   Oliver S. Smart (2004-2015), Julian Holstein (2012-2014), Tom Womack (2006-2012), Claus Flensburg (2011-), Clemens Vonrhein (2011-), Andrew Sharff (2017-)
Version 1.2.19


Contents


What is grade?

grade is a tool for generating good ligand dictionaries. grade typically runs the CCP4 libcheck program to produce an initial dictionary for a given input SMILES string or structure. This initial dictionary is then improved: grade runs the CCDC mogul program (in batch mode) to obtain ideal bond lengths/sigma, bond angles/sigmas and planar torsions from similar structures in the CSD. In practice mogul does not provide information for all restraint parameters needed because some chemical groups are rare or absent from the CSD. In addition mogul does not provide analysis of features involving hydrogen atoms or torsions within rings. To obtain the missing information, geometry optimization using a quantum chemical method in combination with mogul-based restraints is used. grade uses the RM1 semiempirical method as implemented in the fdynamo library written by Martin Field and co-workers.

The Grade Web Server http://grade.globalphasing.org provides access to grade including mogul, subject to conditions of use, see Grade Web Server documentation.

The simplest way to use grade is to provide a SMILES string on the command line:

grade -resname ABC "c1ccccc1c2ccccc2"
will produce a mmCIF restraint dictionary grade-ABC.cif and ideal coordinates grade-ABC.pdb. This dictionary can be used in autoBUSTER, rhofit and coot (see tips for using grade dictionaries with coot)

You can also provide grade with a file giving the coordinates of the atoms in your ligand; we would strongly recommend that this file is provided in MOL2 format, since (as of January 2011) we have written special code to translate the sybyl atom types provided in mol2 files into the atom typing system used by BUSTER.


What is grade_PDB_ligand?

grade_PDB_ligand is a tool that collects information for a given PDB three-letter ligand code from the LigandExpo web site, and then uses grade to generate a dictionary for the ligand.

So, to produce a dictionary for a ligand that already exists in the pdb, for example 3AS, use:

grade_PDB_ligand 3AS 
This will produce a mmCIF dictionary called 3AS.grade_PDB_ligand.cif

grade and grade_PDB_ligand command line options

The table below lists the command line options. Column P is used to mark options that:
FlagPArgumentsExplanationRemark
‑h Display usage information Special option to print help message and exit
‑checkdeps Check that all the dependencies are present Special option that checks that the external tools grade needs are accessible and work properly. Useful for setting up grade and testing that the program works on a particular host. This option is one of the tests run by the checkdeps script.
(none) <string> "SMILES string" The SMILES string should be enclosed in quotes to avoid problems with the shell.
‑in <infile> Input filename. Needed unless direct SMILES string input is used.
‑itype <itype> Type of input file. One of SMILES, mol, mol2, pdb or cif
Normally not needed as it is worked out from the file extension of infile.
‑resname <three-letter-code> The three-letter code to be used for the residue name in the output dictionary default XXX
‑name <ligand name> The name of the ligand (eg 2,3-difluorophenylhydrazine) default XXX
‑databaseid <ligand corporate or database id> Set this to the identifier normally used to describe the ligand. For example corporate IDs such as AT7519 AZD4017 CP-31398 GSK2801 MK-3102. Alternatively you could use CID numbers from pubchem. The databaseid will be prominently displayed on the buster-report ligand analysis page. Default is no value set or used. (Except for grade_PDB_ligand where PDB databaseid is used).
‑ocif <cif file> Output mmCIF restraint dictionary. defaults to grade-rescode.cif
See restraints that grade produces
‑opdb <pdb file> Output ideal coordinates in pdb format. defaults to grade-rescode.pdb
The "ideal" coordinates are generated from the final restraints and will also be included in the output mmCIF file.
‑oshelx <dfix filename> Filename for restraints in SHELXL format. defaults to grade-rescode.dfix
Unless ‑oshelx or ‑shelx or ‑shelxH is specified SHELXL dfix restraints will not be output.
‑shelx Convert final mmCIF restraint dictionary to SHELXL restraint dfix file, excluding hydrogen atoms from restraints. The default is not to produce a SHELXL restraint dfix file.
See SHELXL instructions for details of SHELXL restraints. The dfix file will contain DFIX, DANG and FLAT records. To alter filename used for the dfix file use the ‑oshelx option (grade only).
‑shelxH Convert final mmCIF restraint dictionary to SHELXL restraint dfix file, outputing all restraints including those for the hydrogen atoms.
‑charge <charge> The overall charge on the molecule. This is needed for quantum chemical optimization. The default is to use a charge of zero, unless a SMILES input is used with charges in it when the charge is found from the SMILES string. This is also the only command line option for the grade_PDB_ligand utility (which will usually be able to read the charge
‑bigplanes Write out large planes By default, grade describes planarity by a large collection of four-atom planes. With this option, it runs through and fuses them into as small as possible a collection of large planes. This will make the planarity significantly more tightly restrained.
‑really_noH Input molecule really does not have any hydrogen atoms.
‑ecloud Use electron-cloud X-H distances (instead of default nuclear distances) This is adequate for X-Ray refinement, when the hydrogens are visible to the X-Ray term (i.e. kept at full occupancy).
‑chirality-both mark chiral centers always as "both" when starting from MOL2 file
‑td <temporary directory> Preserve the temporary directory created during the run, and set its location – otherwise it will be created by perl's tempdir() function, in $autoBUSTER_scrdir if that environment variable is set or in /tmp by default, and deleted at the end of the run Useful in reporting bugs.
‑f Force the overwrite of the output files even if they exists. if ‑f is not specified then grade will terminate rather than overwrite a file.
‑nonet × Special option if you cannot use curl or wget to do download the required. You will be told what files need to be downloaded. Use a browser to locate the listed files and download them into the current directory. Then rerun the grade_PDB_ligand command. Only use as a last resort, instead install and configure curl, as explained below.
‑lx × Allow use of experimental rather than ideal coordinates in the event that a structure has no ideal coordinates available
‑xdat × Use RCSB LigandExpo in preference to EBI MSDchem (the other will be used if the first fails)

Inputs to grade

grade currently handles a variety of input formats for dictionary creation - including SMILES strings, mmCIF restraint dictionaries and structures in sdf, mol2 or PDB format.

To provide a usable dictionary, grade needs to know atom types and bond orders for all the atoms in the structure. It can obtain these in two ways: an input structure with hydrogens on is normally enough to determine them implicitly, whilst a SMILES string or an input structure in mol2 format will give them explicitly. To provide a convenient dictionary for refining an existing ligand pose, grade also needs to know names for the atoms.

MOL2 files containing 3D coordinates for all atoms (including hydrogen atoms) are the preferred input format. Using OpenBabel (http://openbabel.org) to convert a PDB file containing 3D coordinates to a MOL2 file fills in the atom types and bond orders. Note that the coordinates in MOL2 files are assumed to be good 3D ones; grade at present will misbehave if given input with only 2D coordinates, so if those are the only coordinates you have then you will probably get better results using a SMILES string. You might also use OpenBabel (versions after 2.2.3) to convert from 2D to 3D.

mol2

We recommend the use of mol2 format for all coordinate input. Using mol2 format with all hydrogen atoms is good for producing dictionaries that have atom names matching a pre-existing structure. We convert the sybyl atom types to BUSTER atom types internally, including setting types for polar hydrogens.

We have tested grade extensively with mol2 files produced by CCDC's conquest and the OpenBabel converter from pdb input. There was at one point a problem with mol2 files produced by corina which did not have a charge column, but this has been fixed.

SMILES

We have extensively tested the generation of dictionaries from SMILES strings. It is possible to either supply the SMILES string in a .smi file or directly quote the smiles file on command line (see example above). You need to be a little careful passing SMILES strings on the command line: put them in quotes, or you will get very unclear error messages as the shell interprets the brackets as subshell calls.

libcheck is used for the initial SMILES to 3D conversion. It generally does a good job but occasionally makes a mistake on a stereochemical centre.

pdb

You can supply pdb input to grade. The initial input of this format is handled by libcheck, and it will often add extra hydrogen atoms - grade picks up this case and will not proceed. Consequently we advise conversion to mol2 format using babel or another tool before supplying the coordinates to grade.

mol

The initial input of mol/sdf/mdl format is handled by libcheck. It generally works well for fully hydrogenated ligands, but we would recommend that you use mol2 instead since that format allows atom names.

cif restraint dictionary

grade can be supplied with a mmCIF restraint dictionary. In this case libcheck will not be used.

We have tested grade on a small number of mmCIF restraint dictionaries produced by the CORINA and PRODRG programs.

Extra information, for example about compatibility with the particular output formats of assorted proprietary tools, will be made available at the input-formats page on the BUSTER wiki


The restraints that grade produces

atomic information

mmCIF dictionaries have an atom type section. This contains information about each atom's name, element, "energy type" and partial charge, along with a set of Cartesian coordinates. The type_energy field is important for ideal distance contacts: it allows atoms that hydrogen bond to get closer than the normal van der Waals contacts would allow.

libcheck normally sets the type_energy of all hydrogen atoms to "H". This is a problem for hydroxyl and amide hydrogens because type_energy "H" is not allowed to hydrogen bond. So grade partly revises the type_energy list, in the case where all hydrogen atoms are type "H", to ensure that sensible contact distances are set. The revision is only made on hydrogen atoms.

grade currently leaves partial charges alone so they are normally zero.

grade sets the coordinates list to "ideal" coordinates generated from the final grade restraints. These coordinates can be retrieved in coot by the 'Extensions .. Modelling .. Monomer from dictionary' option. The ideal coordinates will also be used by the rhofit program.

bonds

grade runs the CCDC mogul program (in batch mode) to obtain ideal bond lengths and sigmas. mogul is run with tight criteria for atom matching, and filters to exclude structures with low R factors; we do not use information from bonds for which mogul gives less than ten hits.

mogul does not provide analysis of bonds to hydrogen atoms, because small molecule X-ray structures do not provide reliable information on hydrogen positions (see conquest documentation); so grade sets the bond lengths to hydrogen atoms to values found from neutron diffraction (as listed in Bond lengths in organic and metal-organic compounds revisited: X-H bond lengths from neutron diffraction data, F. H. Allen and I. J. Bruno, Acta Cryst (2010) B66, 380-386).

To obtain ideal bond lengths when mogul does not identify sufficient information to set them, a partial quantum chemical geometry optimization is performed. During the QM optimization mogul and neutron-based bond, angle and torsion restraints are applied. The restraints are highly weighted compared to the quantum chemical energy. The QM optimization normally uses the RM1 semiempirical method as implemented in the fdynamo library written by Martin Field and co-workers. RM1 provides a good source of reliable information for most organic compounds. grade then sets the "missing" ideal bond lengths to those found at the end of the partial QM optimization.

The use of the mogul and neutron-based restraints ensures that QM optimization is not allowed to disrupt experimentally known features of the molecule. Unrestrained QM geometry optimization can lead to large disruption particularly in the case of zwitterionic compounds that can be neutralized.

An additional piece of information in mmCIF dictionaries is the bond.type, that can be set to single, double, aromatic or other values. Currently grade makes no check on this field, accepting either the values supplied by libcheck or those in the input mmCIF dictionary (depending on the input provided).

bond angles

The approach for bond angles closely follows that described for bond lengths. An exception is that "ideal" bond angles involving hydrogen atoms are set to the values found at the end of the partial QM optimization.

planes

Getting planes correct is critical in producing good dictionaries for ligand fitting and refinement. Grade takes the approach of only producing 4 atom plane records rather than extending to larger planes. The problem with larger planes is that they do not allow the gentle bending that is found in strained small molecule structures. There are two types of 4 atoms planes produced:

     "trig" type plane                      "one-four" type plane
 holding 3 atoms bonded to a                holding 4 atoms bonded 
 central atom flat.			    in a line flat.
 such as in a carboxylic acid:              such as one side of a phenyl ring
             C                                         C     C
             |                                          \   /
             C                                           C-C
            / \
           O   O

to distinguish between different types of plane grade uses different plane names. In the following table, ## denotes a number and ** denotes an atom name.
plane nameExamplesExplanation
csd-** csd-C1 "trig" type plane holding an atom flat. Such a plane will be applied if there is good data in CSD setting all three bond angles and they add up to more than the threshold of 357 degrees. In the example the central atom is C1.
qm-** qm-C1 "trig" type plane holding an atom flat. Similar to csd-** except one or more of the bond angles is set by partial QM.
csdf-## csdf-01 "one-four" type plane identified from a mogul torsion distribution where angles at 0 or 180 degrees predominate (>95%). Note that if the distribution is strongly at just 0 or 180 a torsion angle restraint rather than a plane is used (see below).
ringname‑## ring6A-1 "one-four" type plane within a 5 or 6 atom flat ring. The flat five- and six-membered rings in the structure are named ring5A, ring5B, ... and ring6A, ring6B, ...
qmf-## qmf-01 A "one-four" type plane identified after partial QM. This type of plane can occasionally be set in error.

torsions

grade produces a torsion restraint along each relevant bond in the structure. Although many of these torsions have their sigma set extremely large so will not have an effect on the refinement, a number of them have sigma set to a realistic value and should be used in fitting and refinement. grade uses the BUSTER-KEYWORD mechanism to tell rhofit and autoBUSTER that the torsions should be active.
torsion nameExamplesActive torsion restraint?Explanation
CONST_ringname‑## CONST_ring6A-1 No Torsion within a flat 5 or 6 atom ring. This torsion is not active: a plane is used to keep the ring flat. The CONST_ prefix ensures that coot does not present this torsion in the "Edit Chi angle" dialogue.
csdnotf-## csdnotf-01 No Torsions which, according to the CSD, are observed to be non-flat in more than 5% of samples
csdf-## csdf-01 No Torsions which, according to the CSD, are flat in at least 95% of samples. A plane restraint is also used in this case.
csd-ct-##(A/B) csd-ct-01A Yes: one-fold to either 0 or 180 degrees Torsions which, according to the CSD, are almost always either cis or trans - normally there will be two of these restraints along each bond (A and B). The most common case in which these restraints are produced is for normal peptide bonds.
qmf-01 qmf-01 No Torsions about which the CSD doesn't have an opinion, but which are observed to be flat at the end of a QM optimisation. A plane restraint is also used in this case.
csd-sp3sp3-## csd-sp3sp3-02 Yes: 3-fold to -60,60,180 Torsions joining two SP3 atoms for which mogul has observed a preference for a three-fold staggered conformation (that is, most of the observations are in the ±60 or 180 bins)
qm-sp3sp3-## qm-sp3sp3-02 Yes: 3-fold to -60,60,180 Torsions joining two SP3 atoms, for which mogul has no opinion, and where the atoms are found in a three-fold staggered conformation after the quantum step.
other-### other-007 No Torsions which grade has no opinion on


External tools used by grade

grade uses a number of programs (tools), thanks to:

The external tool setup procedure is shared with buster-report, see buster-report External tools documentation for an overview. The following table describes each of the external tools to be defined to run grade and grade_PDB_ligand:

Program required or optional? Environment Variable Remarks
CCDC mogul $BDG_TOOL_MOGUL
should be set to the full path for the mogul executable or none, unless mogul is on the $PATH.

$BDG_MOGUL_LOCAL_DATABASE_FILE can be set to provide support for Mogul with additional (corporate) libraries
mogul provides lots of useful information to generate restraints. See buster-report BDG_TOOL_MOGUL description for more information, including important notes.
Open Babel obabel grade "optional": turn off by setting $BDG_TOOL_OBABEL to none

grade_PDB_ligand "required" (grade_PDB_ligand is not usable without it)
$BDG_TOOL_OBABEL
should be set to the full path for the obabel executable or none.
See buster-report BDG_TOOL_OBABEL description for more information.
curl grade_PDB_ligand "optional": turn off by setting $BDG_TOOL_CURL to none
$BDG_TOOL_CURL
could be set to the full path for the CURL executable or none. Most OS's provide a curl package and if this is used then curl will be on the users $PATH.
For full functionality grade_PDB_ligand requires curl or wget. curl is used in preference to wget. If your site uses a proxy to limit internet access then use environment variable $BDG_TOOL_CURL_OPTIONS to supply curl command options, and then use grade_PDB_ligand -checkdeps that will try trial downloads and check your settings.
wget grade_PDB_ligand "optional": turn off by setting $BDG_TOOL_WGET to none
$BDG_TOOL_WGET
could be set to the full path for the wget executable or none. Most OS's provide a wget package and if this is used then wget will be on the users $PATH.
If your site uses a proxy to limit internet access then use environment variable $BDG_TOOL_WGET_OPTIONS to supply wget command options (man wget). and then use grade_PDB_ligand -checkdeps that will try trial downloads and check your settings.

How has grade changed over time?


Where can I find up-to-date information about grade? How do I cite it?

See the grade page on the BUSTER wiki - http://www.globalphasing.com/buster/wiki/index.cgi?GradeMainPage - for updated information and also how to use and cite grade.