rhofit documentation

This software is proprietary to and embodies the
confidential technology of Global Phasing Limited (GPhL).
Possession, use, duplication or dissemination of the software is
authorised only pursuant to a valid written licence from GPhL.

Authors: (2009-2011) O.S. Smart, T.Womack
Authors: (2012-2014) O.S. Smart
Authors: (2015-2017) G. Bricogne, C. Flensburg, A. Sharff, C. Vonrhein
Version 1.2.5 (Sep 16 2016)

What is rhofit?
Get started immediately
Some details of how the fitting process works
- How rhofit versions 1.2 and higher handle planes
- Using special keywords in the input dictionary
Details of command-line arguments -h

Describing the ligand -l -keepH -nochirals -scanchirals -trustplanes
Describing the difference density and protein -m -p -clusterweight -clashweight -sctrimb
Telling rhofit where to fit -xclusters -allcluster -c -C
Controlling the fitting process -thorough -quick -nocorrel -correlsort -randskip
Telling rhofit where to put results -d -td
Output options -resnum -hitmaxn -hitmaxd -hitfncut

Interpreting rhofit output

Using visualise-rhofit-coot
What is in results.txt
Files that can be found in output directory

How to cite use of rhofit
History of improvements to rhofit
Help and further sources of information

What is rhofit?

Rhofit is a tool for fitting ligands into difference density. It differs from FitMAP (our previous fitting tool) in that it is able to change bond lengths and angles within the ligand, rather than only rotate torsion angles; this also allows it to search for correct ring conformations for macrocycles. In addition, rhofit can also search for the correct chirality if this is not known for an input ligand.

Rhofit uses the gelly geometry function to assess ligand strain and protein-ligand contacts, so its results will be compatible with further BUSTER refinement. Rhofit runs a large number of independent trials and reports the best solutions that it finds; you can give it some idea of how long you want it to search for. We aim for rhofit with default parameters to take significantly less time that the refinement required to produce its input.

Get started immediately

Take some structure in which there is a ligand to be found and do

refine -L -p structure.pdb -m data.mtz -d 00 | tee 00.lis

to tell BUSTER to try ligand-chasing.

There are a fair number of command-line arguments available, but the easiest use cases are

rhofit -l ligand.cif -m 00/refine.mtz -p 00/refine.pdb -d 01_rhofit Here the ligand is described by the CCP4-compatible cif dictionary ligand.cif and directory 00 contains the output of the BUSTER run. Results are placed in directory 01_rhofit
rhofit -l ../ligand.cif from within the directory in which a refine command has just been run, so that the files refine.pdb and refine.mtz are in the current directory. The CCP4-compatible cif restraint dictionary ligand.cif is in the directory above.
rhofit -l ligand.cif -scanchirals when you are not confident that the chiralities in ligand.cif are correct and the density is sufficently good to determine the correct chirality.
rhofit -l ligand.cif -xclusters 6 when there are three chains in the protein, each with two binding sites, and you expect the ligand to bind in all six sites

By default, rhofit will produce a list of the connected blobs of difference density from the input MTZ file which are large enough to contain the ligands (each blob is described in a cluster file). rhofit will then try quick runs to fit the ligand into each of these blobs, and then try much more intensively to fit the ligand into the single blob into which it seemed to fit best. If you expect there to be more than one copy of the ligand bound to the structure, use the -xclusters option.

After the rhofit command has finished, it will create a directory 'rhofit'. The easiest way to explore this directory is with the tool visualise-rhofit-coot, which lets you look through the list of positions that rhofit has found for your ligand; they are in order of correlation coefficient so generally the correct conformation should be the first one that appears. The section below interpreting rhofit output has more information.

Please see rhofit introductory tutorial on 2qtu Wiki Page

Some details of how the fitting process works

How rhofit versions 1.2 and higher handle planes

Planes, particularly large planes, are extremely strong restraints; they make motion away from the plane almost impossible even when the density extremely strongly requests it, and the worst examples of bad fits that we see are due to bad planes in the input dictionary. rhofit-1.2 attacks this problem from two directions:

We break the planes in the input file into sets of four-atom planes, either describing an atom and the three atoms bonded to it, or enforcing a 0-or-180° torsion along a bond. This avoids issues where large planes impose enormous penalties on small movements at one end.
Note that the restraint dictionaries generated by the grade tool already have four-atom planes and so will not be broken up.
Instead of a harmonic restraint based on the distance from an atom in a plane to the best-fit plane of the restrained atoms, we use a plateauing term; so, whilst small deviations are penalised the same as in the harmonic case, a restraint which is strongly violated suffers only a fixed penalty, so is effectively turned off.

At the end of the fitting process, planes which have been turned off are deleted from the best.cif file produced by rhofit — so a refinement of merged.pdb using best.cif as dictionary will not reinstate the plane.

There is a command-line option -trustplanes which turns off the plane-breaking and torsion-plateauing machinery; we would suggest that this is used only in cases where the machinery gives wrong answers, and buster-develop@globalphasing.com would be very interested to hear as much as you are allowed to reveal about cases which fail like this.

Using keywords in the input CIF dictionary

rhofit looks for a line of the form
# BUSTER-KEYWORD keyword1 keyword2 ...
in the input dictionary, and changes its behaviour when it sees certain keywords:

TRUSTCOORD The coordinates in the chem_comp_atom.{x,y,z} fields in the CIF file are taken to be the result of a careful generation from the restraints, and rhofit uses them directly rather than regenerating from the restraints

TRUSTTORS The restraints in the chem_comp_tor section of the dictionary are considered to be of good quality, and are used when doing the fitting

At present grade is the only tool which writes out these keywords.

Details of command-line arguments

`-h`	Quick help message listing most important arguments.

Describing the ligand

`-l ligand.dic`	Restraint dictionary for ligand. This can be either in CCP4-style CIF format or TNT format. A good restraint dictionary is important to get good fits. We suggest that the grade tool is used to prepare dictionaries for rhofit.
`-keepH`	This option indicates that rhofit should keep hydrogen atoms in the fit. To use this you must specify an input restraint dictionary with hydrogen atoms in it. By default rhofit will strip hydrogen atoms. Using hydrogen atoms will slow down rhofit and may worsen sampling of things like saturated rings.
`-nochirals`	rhofit will ignore all CHIRAL cards in the input dictionary. This option allows chiral centres to invert at will during the run. Note that atoms will remain tetrahedral because of the bond angle restraints around the centre. This option is a "quick and dirty" one - use `-scanchirals` or `-scanchiralsboth`, if you have time
`-scanchirals`	rhofit will try fitting with every possible permutation of chiralities, and work out which produces the best fit. Note that this will try large numbers of permutations if the ligand has many chiral centres: 256 permutations for a ligand containing two ribose rings. For these cases `-nochirals`, `-scanchiralsboth` or `-scanchiral <c>` should be tried first.
`-scanchiralsboth`	This is similar to `-scanchirals`, but only chiralities marked as "both" in dictionary will be sampled.
`-scanchiral <c>`	This is similar to `-scanchiralsboth`, but giving an explicit (comma-separate) list of chiral center atom names.
`-lp ligand.pdb`	This option can be used to seed the rhofit process with a particular starting position.
`-trustplanes`	By default, if the density disagrees strongly with a plane in the input dictionary, rhofit will turn it off - see the plane-handling section above.

Describing the difference density and protein

`-m refine.mtz`	MTZ file to use for difference density. This can be produced by either BUSTER or REFMAC. (the default is `refine.mtz` in the current directory)
`-p refine.pdb`	PDB file to use for determining protein/ligand clashes (the default is `refine.pdb` in the current directory)
`-clusterweight x`	Set weight used for difference density score in rhofit to x (default 0.25). This does not normally need adjustment.
`-clashweight x`	Set weight used for ligand-protein clash score in rhofit to x (default 0.5). To turn off ligand-protein contacts use `-clashweight 0.0` This parameter seldom needs adjustment.
`-sctrimb x`	This adjusts the B range threshold used to assess whether a protein side chain is poorly placed in density and so should be disregarded when assessing ligand-protein clashes. By default a value of 30 Angstroms squared is used. To turn the feature off set to a large value like `-sctrimb 1000.0`

Telling rhofit where to fit

rhofit uses an intermediate file called a "cluster" file to define the region to fit. This is a pseudo pdb file and can be displayed using the visualise-rhofit-coot visualiser. By default rhofit will itself work out potential ligand binding sites and their cluster files. It will then quickly assess in which single cluster the ligand fits best and thoroughly find out the exact ligand fits in this single site. This behaviour can be controlled by the following options:
`-xclusters n`	Use the best n clusters, found by rhofit, in which the ligand seems to fit best. So if you have a trimer each of which has a ligand binding site specify `-xclusters 3`
`-allclusters`	Find ligand binding fits for every cluster found by rhofit.
`-c cluster.pdb`	Filename of a single cluster in which to fit the ligand. This option stops rhofit from finding clusters itself. cluster files are produced by BUSTER ("refine"), rhofit itself and the prep_rhofit utility (get help for this with prep_rhofit -h). Only use this option if rhofit itself does not correctly identify the binding site you are interested in.
`-C cluster.lis`	Filename of a file containing a list of clusters in which to try to fit the ligand.
`-C cluster.lis`	Filename of a file containing a list of clusters in which to try to fit the ligand.
`-prf.c cut1,cut2 -prf.V vol`	You can use these options to directly control the prep_rhofit run made by rhofit to find clusters from the density. Note that you must specify both `-prf.c` and `-prf.V` they will not work separately. `-prf.c cut1,cut2` is used to specify the values for prep_rhofit -c: this is the "significance cut-off values [rms] for Fo-Fc map (`cut1`) and 2Fo-Fc map (`cut2`)". Default values are `cut1=3.0` and `cut2=1.0`. Note that a comma must be used to separate the two values for `-prf.c`. `-prf.V vol` is used to control the volume used in prep_rhofit -v. There is no default volume in rhofit. Instead rhofit estimates the volume from the number of atoms in the ligand and then adjust this if no clusters are found.

Controlling the fitting process

`-thorough`	Run more trials than usual in the rhofit fit. This may be useful if you want to exact fits for large complex ligands. This is an option to use in pipelines where you have large amounts of compute power available and are not waiting impatiently for the result
`-quick`	Run three times fewer trials than usual; note that this may not find good conformations for complicated ligands. But if you want a quick result and are happy to tweak the result in coot use this option (and also consider `-nocorrel` and `-nochiral`).
`-nocorrel`	By default after fitting each solution is assessed for its correlation coefficient against Fo-Fc. The `-nocorrel` option saves times by not doing this.
`-correlsort`	The first release of rhofit reordered hits in order of corrrelation coefficient against Fo-Fc rather than using the rhofit total score which takes in account ligand strain and ligand-protein contacts. The `-correlsort` will reorder hits on the basis of CC alone. This is usually not a good idea as ligand strain is then ignored.
`-randskip int`	The rhofit trial process uses random numbers to produce randomized conformations/fits. By default the same random number series is used on a rerun of a job (to allow adjustment of conditions such as weights to not be made more difficult). If a different set of random numbers is wanted specify `-randskip integer_number` where integer_number is an integer value e.g., 57 or 2341. This should produce slightly different results.

Telling rhofit where to put results

`-d directory`	Directory to put all final results of the run (defaults to ./rhofit).
`-td directory`	Directory for temporary data during rhofit run. By default a directory will be created (in $TMPDIR if that variable is set, or /tmp otherwise) and removed at the end of a run. `-td` is only likely to be useful in diagnosing problems

Output options

`-resnum N`	Specifies the residue number to be used for the output hit and pdb files. A default value of 4000 is used. `N` must be an integer value in the range -999 to 9999. Note that if two sites are found close to a single chain then the residue number is incremented by one for the second site.
`-hitmaxn int`	The maximum number of hits to be output by rhofit. If you are only interested in the top hit set the value to one. The default value is 10. Must be set to a positive integer value.
`-hitmaxd d`	Sets the distance `d` used to assess whether two hits are the same. The maximum displacement distance for any atoms between the positions is found. If this is less than `d` then the positions are regarded as identical and only the with best rhofit total score is output (this is a greedy clustering). By default `d` is set to 1.5 Å. If you think that rhofit outputs hits that are two similar then try increasing the value to 3.0.
`-hitfncut f`	Sets the function value cut off beyond which rhofit thinks that you will not be interested in seeing hits. If a hit has a rhofit total score that it worse than `f` units compared to best hit it is not output. A default value of 30.0 is used. To see hits thats rhofit regards as poor increase the value of `f` and `-hitmaxn`.

Interpreting rhofit output

Using visualise-rhofit-coot

rhofit produces an output directory whose name you can specify with the -d option at the end of the run this will contain the result. The ligand fit solutions are best examined using the visualise-rhofit-coot utility. It allows you to use the coot program to quickly click thorough potential fits of the ligand to difference density. To start the utility first cd into the results directory and type the command visualise-rhofit-coot (you must have both coot and the Global Phasing setup.csh or setup.sh defined). The command will start coot and present you with the best solution found by rhofit and a dialogue box like:

To examine the different hits for this cluster then use the Previous position and Next Position buttons. The Previous search region and Next search region buttons allow switching between different clusters or chiral combinations. The check boxes Cluster visible and Protein visible allow you to quickly turn on/off cluster and protein display

What is in results.txt

The results.txt provides a summary of the rhofit run . A typical file (example 6 from the rhofit wiki 2qtu tutorial) is:

Summary of results (also written to rhofit_tutorial4/results.txt)
rhofit version 1.2.0
Run in directory /mnt/scratch_fs1/twomack/rhofit_2qtu_tutorial with command-line

  -l supplied_ligand_wrongchiral_libcheck_XXX.cif -xclusters 2 -p supplied_01_buster_d/refine.pdb -m supplied_01_buster_d/refine.mtz -d rhofit_tutorial4 -scanchirals

Volume of clusters used for fitting:   148.6  138.6

                              rhofit           ligand LigProt  Poorly  
                               total   Correl  strain contact fitting   LigProt contact to residues
   File               Chain    score   coeff    score   score   atoms   (% means zero weighted in score)
   =================================================================================================
   Hit_00_00_000.pdb   A    -1693.0   0.8724    -5.3     3.7    0/26     A|476:LEU A|487:VAL

   Hit_00_01_000.pdb   A    -1646.2   0.8512    12.8     5.1    0/26     A|356:PHE A|376:ILE A|476:LEU A|487:VAL

   Hit_00_02_000.pdb   A    -1604.6   0.8465    44.0     9.7    0/26     A|376:ILE A|476:LEU A|487:VAL
   Hit_00_02_001.pdb   A    -1603.2   0.8330    20.4    13.1    0/26     A|299:THR A|302:ALA A|376:ILE A|377:PHE A|475:HIS A|476:LEU A|487:VAL
   Hit_00_02_002.pdb   A    -1595.2   0.8088    13.5    17.7    0/26     A|299:THR A|302:ALA A|475:HIS A|476:LEU A|487:VAL

      ...

   Hit_01_00_000.pdb   B    -1469.6   0.8533     8.8     5.1    0/26     B|299:THR B|302:ALA B|475:HIS B|476:LEU B|487:VAL W|26:HOH% W|6:HOH%
   Hit_01_00_001.pdb   B    -1446.3   0.8501    11.4    16.4    0/26     B|298:LEU B|299:THR B|302:ALA B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%

   Hit_01_01_000.pdb   B    -1427.8   0.8314    27.9     4.8    0/26     B|302:ALA B|305:GLU B|476:LEU B|487:VAL W|26:HOH% W|6:HOH%
   Hit_01_01_001.pdb   B    -1420.5   0.8362    31.0     9.4    0/26     B|299:THR B|302:ALA B|305:GLU B|475:HIS B|476:LEU B|487:VAL W|26:HOH% W|6:HOH%
   Hit_01_01_002.pdb   B    -1413.3   0.8324    19.1    12.6    1/26     B|298:LEU B|299:THR B|305:GLU B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%
   Hit_01_01_003.pdb   B    -1413.2   0.8412    24.2    22.5    0/26     B|298:LEU B|299:THR B|302:ALA B|305:GLU B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%

   Hit_01_02_000.pdb   B    -1402.5   0.8312    24.1    21.7    0/26     B|298:LEU B|299:THR B|302:ALA B|305:GLU B|376:ILE B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%
   Hit_01_02_001.pdb   B    -1402.5   0.8216    40.2     9.5    0/26     B|302:ALA B|305:GLU B|376:ILE B|476:LEU W|26:HOH% W|6:HOH%
   Hit_01_02_002.pdb   B    -1400.7   0.8358    33.5    17.2    0/26     B|298:LEU B|299:THR B|305:GLU B|376:ILE B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%

      ...

First details of the command line used for the run and the volumes of the clusters used are given. Then there is a table:

"File" is the filename containing the ligand conformation.
"Chain" is the chain to which rhofit thinks the ligand belongs
"rhofit total score" is the score used to work out fits by rhofit, relative to an unstrained ligand at infinite distance from density. It is a combination of a fit to the density score, strain score and LigProt contact score.
"Correl coeff" is the correlation coefficient between the conformation and the density in the .mtz file which you gave rhofit
"ligand strain score" gives an indication of whether the proposed fit is strained relative to an unbounded ligand. As a very rough guide a twisted boat conformation of cyclohexane (modelled without hydrogen atoms) has a strain score of around 19 units relative to the chair form. In comparison Wikipedia suggests "The twist-boat conformation is 5.5 kcal/mol (23 kJ/mol) less stable than the chair conformation".
"LigProt contact score" is the contact score between the ligand and the protein. Currently this is purely repulsive with no attractive hydrogen bond type contribution. As the protein remains fixed, a TNT bad-contact term is used, which only imposes a penalty for extremely short contacts. Water molecules and side chains with high B factors are disregarded in the score (and are marked with a "%" in the next column). The contacts scores here are small and should be easily relieved by refinement.
"Poorly fitting atoms" is a measure of the number of atoms that have no Fo-Fc density at all. In this case there are no poorly fitting atoms
"LigProt contact to residues" lists the residues that make short contacts to the ligand. In this case two water molecules have been left in the ligand site B. Although this does not create a penalty it is sensible to remove them before further refinement - rhofit does this automatically when producing merged.pdb.

Files that can be found in output directory

`results.txt`	A text file that provides a summary of the run (see above for discussion).
`best.pdb`	The best position found for the ligand in the best cluster.
`merged.pdb`	Combination of the protein with the best hit for the ligand in each cluster found. Water molecules that clash with the ligand are stripped out, others left alone
`refine.pdb`	The protein used in the rhofit run
`refine.mtz`	The MTZ file used for difference density in the rhofit run
`Hit_00_00_000.pdb`	The best position found for the ligand, in the best cluster, using the best dictionary for that cluster (this file is the same as `best.pdb`).
`Hit_00_00_001.pdb`	The 2nd best position found for the ligand in the best cluster (if more than one solution was found), using the best dictionary.
`Hit_00_01_000.pdb`	The best position found, in the best cluster, using the second-best dictionary
`Hit_01_00_000.pdb`	The best position found, in the second-best cluster, using the dictionary which worked best for the first cluster
`best.cif`	The CIF-format restraint dictionary corresponding to `best.pdb`.
`cluster00.pdb, cluster01.pdb ...`	The cluster files used by the fitter; the number on the file corresponds to the first of the two-digit numbers in the Hit name
`dict00.cif, dict01.cif ...`	The dictionaries used by the fitter; the number on the file corresponds to the second of the two-digit numbers in the Hit filename. Multiple dictionaries are produced only if you use `-scanchirals`; do 'diff input_dictionary.cif dict00.cif' to see which chiral centres have been flipped

How to cite use of rhofit

Smart OS, Womack, TO, Sharff A, Flensburg C, Keller P, Paciorek W, Vonrhein C and Bricogne G (2014) RHOFIT, version 1.2.4 Cambridge, United Kingdom: Global Phasing Ltd.

History of improvements to rhofit

See separate page: History of improvements to rhofit

Help and further sources of information

Please see: rhofit Wiki page

If rhofit crashes or produces poor results, please email buster-develop@globalphasing.com and tell us as much as you can about what you were doing when it crashed; if you can rerun the job with -td temp and send us temp.tar.gz, that would be perfect.

Please let us know if you would like any feature to be added to rhofit, by emailing buster-develop@globalphasing.com.

`TRUSTCOORD`	The coordinates in the `chem_comp_atom.{x,y,z}` fields in the CIF file are taken to be the result of a careful generation from the restraints, and rhofit uses them directly rather than regenerating from the restraints
`TRUSTTORS`	The restraints in the `chem_comp_tor` section of the dictionary are considered to be of good quality, and are used when doing the fitting

Contents

History of improvements to rhofit