rhofit documentation

Copyright © 2009-2017 Global Phasing Limited

All rights reserved.

This software is proprietary to and embodies the
confidential technology of Global Phasing Limited (GPhL).
Possession, use, duplication or dissemination of the software is
authorised only pursuant to a valid written licence from GPhL.

Authors: (2009-2011) O.S. Smart, T.Womack
Authors: (2012-2014) O.S. Smart
Authors: (2015-2017) G. Bricogne, C. Flensburg, A. Sharff, C. Vonrhein
Version 1.2.5 (Sep 16 2016)


Contents


What is rhofit?

Rhofit is a tool for fitting ligands into difference density. It differs from FitMAP (our previous fitting tool) in that it is able to change bond lengths and angles within the ligand, rather than only rotate torsion angles; this also allows it to search for correct ring conformations for macrocycles. In addition, rhofit can also search for the correct chirality if this is not known for an input ligand.

Rhofit uses the gelly geometry function to assess ligand strain and protein-ligand contacts, so its results will be compatible with further BUSTER refinement. Rhofit runs a large number of independent trials and reports the best solutions that it finds; you can give it some idea of how long you want it to search for. We aim for rhofit with default parameters to take significantly less time that the refinement required to produce its input.


Get started immediately

Take some structure in which there is a ligand to be found and do

refine -L -p structure.pdb -m data.mtz -d 00 | tee 00.lis

to tell BUSTER to try ligand-chasing.

There are a fair number of command-line arguments available, but the easiest use cases are

By default, rhofit will produce a list of the connected blobs of difference density from the input MTZ file which are large enough to contain the ligands (each blob is described in a cluster file). rhofit will then try quick runs to fit the ligand into each of these blobs, and then try much more intensively to fit the ligand into the single blob into which it seemed to fit best. If you expect there to be more than one copy of the ligand bound to the structure, use the -xclusters option.

After the rhofit command has finished, it will create a directory 'rhofit'. The easiest way to explore this directory is with the tool visualise-rhofit-coot, which lets you look through the list of positions that rhofit has found for your ligand; they are in order of correlation coefficient so generally the correct conformation should be the first one that appears. The section below interpreting rhofit output has more information.

Please see rhofit introductory tutorial on 2qtu Wiki Page


Some details of how the fitting process works

How rhofit versions 1.2 and higher handle planes

Planes, particularly large planes, are extremely strong restraints; they make motion away from the plane almost impossible even when the density extremely strongly requests it, and the worst examples of bad fits that we see are due to bad planes in the input dictionary. rhofit-1.2 attacks this problem from two directions:

At the end of the fitting process, planes which have been turned off are deleted from the best.cif file produced by rhofit — so a refinement of merged.pdb using best.cif as dictionary will not reinstate the plane.

There is a command-line option -trustplanes which turns off the plane-breaking and torsion-plateauing machinery; we would suggest that this is used only in cases where the machinery gives wrong answers, and buster-develop@globalphasing.com would be very interested to hear as much as you are allowed to reveal about cases which fail like this.

Using keywords in the input CIF dictionary

rhofit looks for a line of the form
# BUSTER-KEYWORD keyword1 keyword2 ...
in the input dictionary, and changes its behaviour when it sees certain keywords:
TRUSTCOORD The coordinates in the chem_comp_atom.{x,y,z} fields in the CIF file are taken to be the result of a careful generation from the restraints, and rhofit uses them directly rather than regenerating from the restraints
TRUSTTORS The restraints in the chem_comp_tor section of the dictionary are considered to be of good quality, and are used when doing the fitting

At present grade is the only tool which writes out these keywords.


Details of command-line arguments

-h Quick help message listing most important arguments.

Describing the ligand

-l ligand.dic Restraint dictionary for ligand. This can be either in CCP4-style CIF format or TNT format. A good restraint dictionary is important to get good fits. We suggest that the grade tool is used to prepare dictionaries for rhofit.
-keepH This option indicates that rhofit should keep hydrogen atoms in the fit. To use this you must specify an input restraint dictionary with hydrogen atoms in it. By default rhofit will strip hydrogen atoms. Using hydrogen atoms will slow down rhofit and may worsen sampling of things like saturated rings.
-nochirals rhofit will ignore all CHIRAL cards in the input dictionary. This option allows chiral centres to invert at will during the run. Note that atoms will remain tetrahedral because of the bond angle restraints around the centre. This option is a "quick and dirty" one - use -scanchirals or -scanchiralsboth, if you have time
-scanchirals rhofit will try fitting with every possible permutation of chiralities, and work out which produces the best fit. Note that this will try large numbers of permutations if the ligand has many chiral centres: 256 permutations for a ligand containing two ribose rings. For these cases -nochirals, -scanchiralsboth or -scanchiral <c> should be tried first.
-scanchiralsboth This is similar to -scanchirals, but only chiralities marked as "both" in dictionary will be sampled.
-scanchiral <c> This is similar to -scanchiralsboth, but giving an explicit (comma-separate) list of chiral center atom names.
-lp ligand.pdb This option can be used to seed the rhofit process with a particular starting position.
-trustplanes By default, if the density disagrees strongly with a plane in the input dictionary, rhofit will turn it off - see the plane-handling section above.

Describing the difference density and protein

-m refine.mtz MTZ file to use for difference density. This can be produced by either BUSTER or REFMAC. (the default is refine.mtz in the current directory)
-p refine.pdb PDB file to use for determining protein/ligand clashes (the default is refine.pdb in the current directory)
-clusterweight x Set weight used for difference density score in rhofit to x (default 0.25). This does not normally need adjustment.
-clashweight x Set weight used for ligand-protein clash score in rhofit to x (default 0.5). To turn off ligand-protein contacts use -clashweight 0.0 This parameter seldom needs adjustment.
-sctrimb x This adjusts the B range threshold used to assess whether a protein side chain is poorly placed in density and so should be disregarded when assessing ligand-protein clashes. By default a value of 30 Angstroms squared is used. To turn the feature off set to a large value like -sctrimb 1000.0

Telling rhofit where to fit

rhofit uses an intermediate file called a "cluster" file to define the region to fit. This is a pseudo pdb file and can be displayed using the visualise-rhofit-coot visualiser. By default rhofit will itself work out potential ligand binding sites and their cluster files. It will then quickly assess in which single cluster the ligand fits best and thoroughly find out the exact ligand fits in this single site. This behaviour can be controlled by the following options:
-xclusters n Use the best n clusters, found by rhofit, in which the ligand seems to fit best. So if you have a trimer each of which has a ligand binding site specify -xclusters 3
-allclusters Find ligand binding fits for every cluster found by rhofit.
-c cluster.pdb Filename of a single cluster in which to fit the ligand. This option stops rhofit from finding clusters itself. cluster files are produced by BUSTER ("refine"), rhofit itself and the prep_rhofit utility (get help for this with prep_rhofit -h). Only use this option if rhofit itself does not correctly identify the binding site you are interested in.
-C cluster.lis Filename of a file containing a list of clusters in which to try to fit the ligand.
-C cluster.lis Filename of a file containing a list of clusters in which to try to fit the ligand.
-prf.c cut1,cut2 -prf.V vol You can use these options to directly control the prep_rhofit run made by rhofit to find clusters from the density. Note that you must specify both -prf.c and -prf.V they will not work separately. -prf.c cut1,cut2 is used to specify the values for prep_rhofit -c: this is the "significance cut-off values [rms] for Fo-Fc map (cut1) and 2Fo-Fc map (cut2)". Default values are cut1=3.0 and cut2=1.0. Note that a comma must be used to separate the two values for -prf.c. -prf.V vol is used to control the volume used in prep_rhofit -v. There is no default volume in rhofit. Instead rhofit estimates the volume from the number of atoms in the ligand and then adjust this if no clusters are found.

Controlling the fitting process

-thorough Run more trials than usual in the rhofit fit. This may be useful if you want to exact fits for large complex ligands. This is an option to use in pipelines where you have large amounts of compute power available and are not waiting impatiently for the result
-quick Run three times fewer trials than usual; note that this may not find good conformations for complicated ligands. But if you want a quick result and are happy to tweak the result in coot use this option (and also consider -nocorrel and -nochiral).
-nocorrel By default after fitting each solution is assessed for its correlation coefficient against Fo-Fc. The -nocorrel option saves times by not doing this.
-correlsort The first release of rhofit reordered hits in order of corrrelation coefficient against Fo-Fc rather than using the rhofit total score which takes in account ligand strain and ligand-protein contacts. The -correlsort will reorder hits on the basis of CC alone. This is usually not a good idea as ligand strain is then ignored.
-randskip int The rhofit trial process uses random numbers to produce randomized conformations/fits. By default the same random number series is used on a rerun of a job (to allow adjustment of conditions such as weights to not be made more difficult). If a different set of random numbers is wanted specify -randskip integer_number where integer_number is an integer value e.g., 57 or 2341. This should produce slightly different results.

Telling rhofit where to put results

-d directory Directory to put all final results of the run (defaults to ./rhofit).
-td directory Directory for temporary data during rhofit run. By default a directory will be created (in $TMPDIR if that variable is set, or /tmp otherwise) and removed at the end of a run. -td is only likely to be useful in diagnosing problems

Output options

-resnum N Specifies the residue number to be used for the output hit and pdb files. A default value of 4000 is used. N must be an integer value in the range -999 to 9999. Note that if two sites are found close to a single chain then the residue number is incremented by one for the second site.
-hitmaxn int The maximum number of hits to be output by rhofit. If you are only interested in the top hit set the value to one. The default value is 10. Must be set to a positive integer value.
-hitmaxd d Sets the distance d used to assess whether two hits are the same. The maximum displacement distance for any atoms between the positions is found. If this is less than d then the positions are regarded as identical and only the with best rhofit total score is output (this is a greedy clustering). By default d is set to 1.5 Å. If you think that rhofit outputs hits that are two similar then try increasing the value to 3.0.
-hitfncut f Sets the function value cut off beyond which rhofit thinks that you will not be interested in seeing hits. If a hit has a rhofit total score that it worse than f units compared to best hit it is not output. A default value of 30.0 is used. To see hits thats rhofit regards as poor increase the value of f and -hitmaxn.

Interpreting rhofit output

Using visualise-rhofit-coot

rhofit produces an output directory whose name you can specify with the -d option at the end of the run this will contain the result. The ligand fit solutions are best examined using the visualise-rhofit-coot utility. It allows you to use the coot program to quickly click thorough potential fits of the ligand to difference density. To start the utility first cd into the results directory and type the command visualise-rhofit-coot (you must have both coot and the Global Phasing setup.csh or setup.sh defined). The command will start coot and present you with the best solution found by rhofit and a dialogue box like:

To examine the different hits for this cluster then use the Previous position and Next Position buttons. The Previous search region and Next search region buttons allow switching between different clusters or chiral combinations. The check boxes Cluster visible and Protein visible allow you to quickly turn on/off cluster and protein display

What is in results.txt

The results.txt provides a summary of the rhofit run . A typical file (example 6 from the rhofit wiki 2qtu tutorial) is:

Summary of results (also written to rhofit_tutorial4/results.txt)
rhofit version 1.2.0
Run in directory /mnt/scratch_fs1/twomack/rhofit_2qtu_tutorial with command-line

  -l supplied_ligand_wrongchiral_libcheck_XXX.cif -xclusters 2 -p supplied_01_buster_d/refine.pdb -m supplied_01_buster_d/refine.mtz -d rhofit_tutorial4 -scanchirals

Volume of clusters used for fitting:   148.6  138.6

                              rhofit           ligand LigProt  Poorly  
                               total   Correl  strain contact fitting   LigProt contact to residues
   File               Chain    score   coeff    score   score   atoms   (% means zero weighted in score)
   =================================================================================================
   Hit_00_00_000.pdb   A    -1693.0   0.8724    -5.3     3.7    0/26     A|476:LEU A|487:VAL

   Hit_00_01_000.pdb   A    -1646.2   0.8512    12.8     5.1    0/26     A|356:PHE A|376:ILE A|476:LEU A|487:VAL

   Hit_00_02_000.pdb   A    -1604.6   0.8465    44.0     9.7    0/26     A|376:ILE A|476:LEU A|487:VAL
   Hit_00_02_001.pdb   A    -1603.2   0.8330    20.4    13.1    0/26     A|299:THR A|302:ALA A|376:ILE A|377:PHE A|475:HIS A|476:LEU A|487:VAL
   Hit_00_02_002.pdb   A    -1595.2   0.8088    13.5    17.7    0/26     A|299:THR A|302:ALA A|475:HIS A|476:LEU A|487:VAL

      ...

   Hit_01_00_000.pdb   B    -1469.6   0.8533     8.8     5.1    0/26     B|299:THR B|302:ALA B|475:HIS B|476:LEU B|487:VAL W|26:HOH% W|6:HOH%
   Hit_01_00_001.pdb   B    -1446.3   0.8501    11.4    16.4    0/26     B|298:LEU B|299:THR B|302:ALA B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%

   Hit_01_01_000.pdb   B    -1427.8   0.8314    27.9     4.8    0/26     B|302:ALA B|305:GLU B|476:LEU B|487:VAL W|26:HOH% W|6:HOH%
   Hit_01_01_001.pdb   B    -1420.5   0.8362    31.0     9.4    0/26     B|299:THR B|302:ALA B|305:GLU B|475:HIS B|476:LEU B|487:VAL W|26:HOH% W|6:HOH%
   Hit_01_01_002.pdb   B    -1413.3   0.8324    19.1    12.6    1/26     B|298:LEU B|299:THR B|305:GLU B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%
   Hit_01_01_003.pdb   B    -1413.2   0.8412    24.2    22.5    0/26     B|298:LEU B|299:THR B|302:ALA B|305:GLU B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%

   Hit_01_02_000.pdb   B    -1402.5   0.8312    24.1    21.7    0/26     B|298:LEU B|299:THR B|302:ALA B|305:GLU B|376:ILE B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%
   Hit_01_02_001.pdb   B    -1402.5   0.8216    40.2     9.5    0/26     B|302:ALA B|305:GLU B|376:ILE B|476:LEU W|26:HOH% W|6:HOH%
   Hit_01_02_002.pdb   B    -1400.7   0.8358    33.5    17.2    0/26     B|298:LEU B|299:THR B|305:GLU B|376:ILE B|475:HIS B|476:LEU W|26:HOH% W|6:HOH%

      ...
First details of the command line used for the run and the volumes of the clusters used are given. Then there is a table:

Files that can be found in output directory

results.txt A text file that provides a summary of the run (see above for discussion).
best.pdb The best position found for the ligand in the best cluster.
merged.pdb Combination of the protein with the best hit for the ligand in each cluster found. Water molecules that clash with the ligand are stripped out, others left alone
refine.pdb The protein used in the rhofit run
refine.mtz The MTZ file used for difference density in the rhofit run
Hit_00_00_000.pdb The best position found for the ligand, in the best cluster, using the best dictionary for that cluster (this file is the same as best.pdb).
Hit_00_00_001.pdb The 2nd best position found for the ligand in the best cluster (if more than one solution was found), using the best dictionary.
Hit_00_01_000.pdb The best position found, in the best cluster, using the second-best dictionary
Hit_01_00_000.pdb The best position found, in the second-best cluster, using the dictionary which worked best for the first cluster
best.cif The CIF-format restraint dictionary corresponding to best.pdb.
cluster00.pdb, cluster01.pdb ... The cluster files used by the fitter; the number on the file corresponds to the first of the two-digit numbers in the Hit name
dict00.cif, dict01.cif ... The dictionaries used by the fitter; the number on the file corresponds to the second of the two-digit numbers in the Hit filename. Multiple dictionaries are produced only if you use -scanchirals; do 'diff input_dictionary.cif dict00.cif' to see which chiral centres have been flipped

How to cite use of rhofit

Smart OS, Womack, TO, Sharff A, Flensburg C, Keller P, Paciorek W, Vonrhein C and Bricogne G (2014) RHOFIT, version 1.2.4 Cambridge, United Kingdom: Global Phasing Ltd.


History of improvements to rhofit

See separate page: History of improvements to rhofit


Help and further sources of information

Please see: rhofit Wiki page

If rhofit crashes or produces poor results, please email buster-develop@globalphasing.com and tell us as much as you can about what you were doing when it crashed; if you can rerun the job with -td temp and send us temp.tar.gz, that would be perfect.

Please let us know if you would like any feature to be added to rhofit, by emailing buster-develop@globalphasing.com.