BUSTER Documentation previous next
Additional tools  

BUSTER Documentation : Additional tools

Copyright    © 2003-2020 by Global Phasing Limited
  All rights reserved.
  This software is proprietary to and embodies the confidential
technology of Global Phasing Limited (GPhL). Possession,
use, duplication or dissemination of the software is authorised
only pursuant to a valid written licence from GPhL.
Contact buster-develop@GlobalPhasing.com


checkdeps check that all 3rd party tools needed work properly.

This is a utility that will check programs in the BUSTER suite in turn. checkdeps makes sure that all the required 3rd party tools are installed, available and function properly. Problems are indicated on lines starting "ERROR". If no problems are found then this is shown by "SUCCESS". The utility prints out a summary of results found at the end. The script's exit status will be 0 for success but 1 if any problem is found

currently checkdeps runs:

For help in configuring the software including advice on how to use checkdeps see the detailed installation instructions..

checkdeps command line option:

-n   turn off the prompt for user to hit the Enter key before running each check  

corr - calculate real-space correlation

This tool allows the easy calculation of real-space correlation between a model (PDB file) and a map (usually a 2Fo-Fc map). The normal use is e.g.:

% corr -p refine.pdb -m refine.mtz -F 2FOFCWT -P PH2FOFCWT

which will produce overall and per-residue correlation coefficients on standard output as well as some PLOTMTV-formatted files of main-chain and side-chain correlation plots (e.g. named refine_CC-mc_Chain-A.mtv).

-p <PDB file> PDB file with standard CRYST1 card  
-m <MTZ|MAP file> MTZ or MAP file MTZ file with columns for F, PHI and (optionally) WEIGHT
MAP file in CCP4 format
-F <F> amplitude  
-P <PHI> phase  
-Fc <Fcalc> (optional) amplitude of model default is to calculate structure factors of model from input PDB file (which will then not contain bulk-solvent correction or anisotropic scaling)
-Pc <PHIcalc> phase of model  
-a <atom name> rename atoms to this name done before the CC calculation
-d <subdir> directory name results are expected in this sub-directory and all files will be created there too
-R <resl> <resh> low- and high-resolution limits MTZ file: default is to use full resolution range from this file
-W <WEIGHT> (optional) weight usual coefficients (2FOFCWT, PH2FOFCWT) are already correct map coefficients, so this doesn't need to be given

gelly_refine - interface to GELLY (geometric refinement)

This is a simple interface to the stand-alone version of GELLY, which will do purely geometric refinement (i.e. no X-ray term involved). Therefore, this command can be used to

-f   force overwriting of files default= stop if a file would be overwritten
-p <PDB file> PDB file to be refined  
-o <output file> output PDB file  
-d <subdir> all (temporary) o/p will be written to directory default = current directory
-l <dictionary> additional restraints dictionary files several -l flags can be given; default is to use the standard dictionaries distributed BUSTER
-s <space-group> space-group name default = pick from CRYST1 card of PDB file
-c <cell parameters> cell parameters a, b, c, alpha, beta, gamma default = pick from CRYST1 card of PDB file
-Seq <TNT sequence file> TNT sequence file default = create on-the-fly from input PDB file
-I <Identifier> automatically generated files will start with the string <Identifier> default = "gelly"
-jiggle_xyz <rms> adds a random perturbation (jiggle) to all input atoms before starting refinement The size of this perturbation is given as a mean rms deviation (default is to not jiggle)

Any command-line options not in the above list will be passed directly to the gelly binary; see GELLY for a list of useful options, and a couple of usage examples.

Additionally, the following parameters are defined (which can be overwritten on the command line, using the parameter=value syntax):

weight_bond 2.0 bond distances  
weight_angle 2.0 bond angles  
weight_improper 0.0 improper angles  
  2.0   if all residues in input PDB file are described by user-supplied dictionary files (via the -l flag)
weight_torsion 0.0 torsion angles  
  2.0   if all residues in input PDB file are described by user-supplied dictionary files (via the -l flag)
weight_pseudo 0.0    
  2.0   if all residues in input PDB file are described by user-supplied dictionary files (via the -l flag)
weight_trigonal 2.0    
weight_plane 5.0 planarity  
weight_contact 5.0 contact distances  
weight_bcorrel 0.0 B-factor correlation of bonded atoms  
weight_chiral 5.0 chirality  

graph_autobuster_recipCC view the reciprocal-space correlation coefficient plot

This is a utility that locates the last reciprocal-space correlation coefficient plot produced by BUSTER during a refinement and launches plotmtv to view it. For help on its use see BUSTER Output Interpretation page on the BUSTER wiki. For help with the command options use:
-h   Print brief help message  
-man   Print man page for full description  

graph_autobuster_R produce a graph that shows how Rwork and Rfree change during a refinement

This is a utility that allows the production of a graph that shows how Rwork and Rfree change during a refinement. For help on its use see BUSTER Output Interpretation page on the BUSTER wiki. For help with the command options use:
-h   Print brief help message  
-man   Print man page for full description  

graph_autobuster_QM produce a graph that shows how the QM energy for a ligand changes during a refinement

This is a utility to be used with -qm option of BUSTER. For help on its use see Direct use of weighted Quantum Chemical Energy for ligands page on the BUSTER wiki. For help with the command options use:
-h   Print brief help message  
-man   Print man page for full description  

aB_hydrogenate & hydrogenate - add hydrogen atoms to protein and/or ligands (using MolProbity 'reduce')

This is a tool for adding hydrogen atoms to proteins and/or ligands; it requires the 'reduce' program (distributed as part of CCP4 or the MolProbity suite) to be on the PATH or to be defined using the $BDG_TOOL_MOLPROBITY_ROOT environment variable.

It might be easiest to run "aB_hydrogenate" (which is a wrapper around "hydrogenate"), since this will (1) automatically add missing compound dictionaries (from the CCP4 monomer library) to ensure all residues are hydrogenated and (2) perform various fixes on the initial result (there are some limitations in 'reduce' when it comes to alternate conformations and/or terminal residues).
Parameter for hydrogenateOptionsExplanationRemark
-checkdeps Check that all the dependencies are present Special option that checks that the external tools required (reduce) have been setup properly. This option is one of the tests run by the checkdeps script.
-p <input filename> Protein to hydrogenate  
-o <output filename> Name for the output file  
-l <dictionary1.cif> <dictionary2.cif> ... List of CIF-format dictionaries for the ligands hydrogenate writes out a list of the residue IDs it was unable to hydrogenate; you will want to provide dictionaries for most of them (though obviously not metals); grade_PDB_ligand will be helpful for this.
-ligonly Only hydrogenate the ligands  
-full Add hydrogens with full occupancy (i.e. the same as the carrying atom) instead of zero occupancy  
-f Overwrite the output if it already exists  
-ecloud place hydrogens at electron-cloud (instead of nuclear) position  

mk_coot_macros.sh - generate macros to use with Coot

This is a simple script to be run in the BUSTER output directory (i.e. where the refine.pdb file is). It will create a file Coot.scr that can be used in Coot:

% mk_coot_macros.sh
% coot --script Coot.scr
See also visualise-geometry-coot - launch coot to see BUSTER refinement result

mk_pymol_macros.sh - generate macros to use with Pymol

For Pymol, this script will generate a file pymol.pml to be used like this:

% mk_pymol_macros.sh
% pymol pymol.pml

pdb2seq - generate TNT sequence from PDB

If a TNT sequence file is needed (e.g. when running gelly_refine), this command will generate it for you.

Please note that you can't use standard output (captured in a file) directly as a TNT sequence file. If you want to create a file please use the -o command line argument.

-p <PDB file> PDB file following the recommendations  
-o <output file> (optional) output file for TNT sequence default is standard output

By default chain breaks in the input PDB file will be converted into BREAK statements in the resulting sequence file. If the parameter UseGapAsBreakInSeq is set to yes (on the command line: UseGapAsBreakInSeq=yes), then a so-called GAP-residue is used instead. The effect is that a range-definition (e.g. for defining a rigid-body) can 'step over' a GAP-residue but not over a BREAK.

pdbchk - check (and optionally fix) PDB files

This tool can be used to make sure a PDB file conforms to most of the PDB format standards as well as some slightly more stringent requirements for BUSTER and BUSTER.

-p input file PDB formatted coordinate file  
-o output file (optional) PDB formatted coordinate file the presence of this optional argument triggers functionality within "pdbchk" that will try and fix any encountered problems of the input file

The list of tests performed (in this order) is:

Test (name)ExplanationFixing
NoCryst1 checking if we're missing CRYST1 record  - 
Cell checking for cell parameters on CRYST1 record  - 
NoSpacegroup checking if CRYST1 doesn't contain a spacegroup  - 
Spgr checking for spacegroup name on CRYST1 record  - 
EmptyLines checking for empty records  - 
HaveCoordinateRecords checking if we have any coordinate records  - 
RecordsStartingWithSpace checking if we have any records starting with a space  - 
SeveralModels checking if PDB file contains several models  - 
WeirdCellParameters checking if cell parameters on CRYST1 are weird  - 
WeirdCellVolume checking if cell volume (from CRYST1 record) is weird  - 
BarSpacegroup checking if spacegroup symbols has 'bar' (e.g. P -1/P 1-) change spacegroup symbol (e.g. from "P 1-" to "P -1"
R3H3 checking if R3/R32/R3m/R3c is meant to be H3/H32/H3m/H3c change spacegroup symbol (e.g. from "R 3" to "H 3"
UnknownSpacegroup checking if spacegroup name is unknown  - 
CellSpacegroupInconsistency check if cell and spacegroup are consistent  - 
UnknownTntSpacegroup checking if for given spacegroup we have a TNT equivalent  - 
RecordsStandardOrder checking if records are in standard order records will be reordered according to PDB Format (up to CRYST1 record)
RecordFormat checking if some crucial records have correct format  - 
SsbondIsCys checking if SSBOND records contain only CYS residues  - 
ResidueNumbersOnRecordsAreInteger check if residue numbers on records are Integer re-write residue numbers as integers on records SEQADV, MODRES, HET, SSBOND, CISPEP, LINK, SLTBRG, HYDBND, SITE, ATOM and HETATM
ResidueNumberInsertionCodeFive checking if residue number > 999 and insertion code present (TNT limitation)  - 
EmptyAtomNameOnLinkRecord check if LINK records contain empty atom names (in both positions) remove those LINK records
WrongReferenceToCoordinateRecord checking for wrong references to coordinate records  - 
NoChainId checking for ATOM/HETATM records without chain identifier add new chain ID to records without one (this includes the following records: DBREF, SEQADV, SEQRES, MODRES, HET, SSBOND, LINK, HYDBND, SLTBRG, CISPEP, SITE, ATOM, SIGATM, ANISOU, SIGUIJ, TER and HETATM)
OxyResidueName checking if there are residues called "OXY" (special treatment in TNT) residues will be renamed from "OXY" to " O2" (if the "OXY" residue contains atoms " O1 " and " O2 ")
DuplicateChainRes checking for ATOM/HETATM records where the same chainID+resSeq+iCode is used for different resName if possible, adding chain ID "W" to water residues (residue name "HOH")
StandardResiduesHetatm checking if standard residues have (wrong) HETATM record change record from HETATM to ATOM
NonStandardResiduesAtom checking if non-standard residues have (wrong) ATOM record change record from ATOM to HETATM
BfactorNegative checking if ATOM/HETATM records have negative B-factors set B-factor to zero
OccRange checking if ATOM/HETATM records have occupancy in range 0.0 ... 1.0 limit occupancy to range zero to one
AlternateConformationsOccSum checking if alternate conformations of ATOM/HETATM records have an occupancy sum in range 0.0 ... 1.0  - 
AtomNamesWithSpaces checking if atom names have space in them replaces spaces by underscore "_"
ElementType checking if element type is present and consistent with atom name guesstimate element from atom name

seq2seq - generate TNT sequence from ASCII file

To convert simple ASCII files with sequence information (FASTA, PIR etc), this tool can be used. It recognised all 20 amino-acids (so Se-MET containing proteins need editing of the resulting TNT sequence file).

-s ASCII sequence file file with (upper-case) protein sequence  
-i ResNumStart starting residue number default = 1
-c ChainId 1-character chain identifier default = " "

pdb2dpi - calculate various versions of the "diffraction-component precision index"

Using the information recorded in the REMARK section of a PDB file, this tool will calculate various versions (based on R or Rfree) of the diffraction-component precision index as dedfined by Cruickshank and Blow.
-p PDB file  

pdb2occ - generate template for refining occupancy from PDB file

Simple script to generate some Gelly-syntax statements for occupancy refinement from a given PDB file. It analyses residues with alternate conformation indicators (column 17) as well as residues with occupancies lower than one. Some assumptions about a sensible PDB format are made.

Consecutive residues with alternate conformations and same occupancy will be grouped together. If only two alternate conformations are given for a residue, then their summed occupancy will be restrained to 1.0.

If specific residues are given with the -r flag, only those will be considered irrespective of their occupancy value.

For further details on how to use pdb2occ and how to perform occupancy refinement see the occupancy refinement tutorials on the BUSTER wiki.

-p PDB file input PDB file
-o output file optional output file containing Gelly-syntax
-r res optional residue specification given as residue name (e.g. "ADP") or residue number including chain identifier (e.g. "A|207"); multiple -r flags can be given

pdb2tls - extract TLS information from PDB file

-p PDB file  
-o output TLS file optional  
-a autotype use automatic definition for BUSTER. The automatic definition type can be one of "EachMacroMolChain" or "OnePerChain". Default is "EachMacroMolChain".  
See TLS refinement section for further information.

refmacdict2tnt - convert CCP4/REFMAC dictionary to TNT format

This program converts a CCP4/REFMAC-style cif restraint dictionary to TNT format, preserving atom-type information which is used by the Gelly ideal contact term.

The typical usage would be:
% refmacdict2tnt <restraint file> <TNT output file> [<PDB output file]

Note that BUSTER can usually handle cif restraint dictionaries directly if you pass them using the -l flag; if you find yourself routinely converting them manually, please contact buster-develop@globalphasing.com and we will try to make your work-flow easier.

Note that the flags for refmacdict2tnt must go before the filenames

-nopdb Don't extract atom-position information from the input .cif file If you don't use this option, you need to specify a filename for the PDB output
-believetorsions Preserve sigma values when translating torsion cards in the input  
-notorsions Ignore all torsion cards in the input  
-oneplane Do not output an extra, dehydrogenated version of any plane containing hydrogens If any atom in a plane is missing then BUSTER will not apply that plane restraint at all - so if your input dictionary has large planes containing hydrogens, and you are refining a model lacking hydrogens, you must use -oneplane
-fixplanesigma Tweak sigma values for planes so that the TNT and CCP4/REFMAC geometry functions give identical values  
-tlc XXX Set three-letter code to use for the single ligand in the CIF file  
-model abc.pdb Convert only ligands which appear in abc.pdb with a HETSYN card containing a synonym of the form +id; use the three-letter code that appears in that HETSYN card. This option (introduced in early 2012) is intended to make it easier to work with compound libraries without having to worry about unique three-letter codes for each ligand

visualise-geometry-coot - launch coot to see BUSTER refinement result

this is a useful way of quickly launching coot to the view the results of a BUSTER refinement. It should launch coot (that must be on your path) and load the final refine.pdb structure together with maps from the mtz file. In addition a listing of the worst geometry violations is displayed. Click on this to jump to the atoms in question.

For help on its use see BUSTER Output Interpretation page on the BUSTER wiki.

The procedure is run
%  visualise-geometry-coot <BUSTER refinement directory>

diff_fourier - calculate (and analyse) various types of difference Fourier maps


We will described a tool to calculate different types of difference Fourier maps. We will not be dealing here with the normal difference ("Fo-Fc") or "2Fo-Fc" map that is used in model refinement and building, but rather with maps that use differences between measured amplitudes.

Running the tool

%  diff_fourier -h
should bring up a help message.

Upon successful running, the script will create several output files - the prefix of which can be set with the -o flag. Other potentially useful flags (for full details see output of -h):

Anomalous difference Fourier map

%  diff_fourier -m truncate.mtz -p refine.mtz -P PH2FOFCWT FOM -o AnoFourier

If a PDB file (consistent with the phases) is also given with
%  diff_fourier -m truncate.mtz -p refine.mtz -P PH2FOFCWT FOM -o AnoFourier -pdb refine.pdb

An example output looks like this:

 mtz ......................................... truncate.mtz
 F ........................................... F
 SIGF ........................................ SIGF
 DANO ........................................ DANO
 SANO ........................................ SIGDANO

 pmtz ........................................ refine.mtz
 PHI ......................................... PH2FOFCWT
 FOM ......................................... FOM

 pdb ......................................... refine.pdb

   7 peaks above 20 sigma
   9 peaks above 15 sigma
  11 peaks above 10 sigma
  11 peaks above  8 sigma
  12 peaks above  6 sigma
  12 peaks above  5 sigma
  37 peaks above  4 sigma

-rw-r--r-- 1 vonrhein vonrhein  2132 Oct 10 15:29 AnoFourier.ANO.compare
-rw-r--r-- 1 vonrhein vonrhein 13940 Oct 10 15:29 AnoFourier.ANO.hatom
-rw-r--r-- 1 vonrhein vonrhein 24715 Oct 10 15:29 AnoFourier.ANO.pdb


Peak         Closest atom in refine.pdb
[rms]                                             Distance (<= 1.0 )
 31.23  <=>  SE   MSE F   7  (  0.84 40.87)  :       0.07
 30.91  <=>  SE   MSE A   7  (  0.84 45.76)  :       0.04
 30.22  <=>  SE   MSE A 126  (  0.66 45.08)  :       0.08
 29.08  <=>  SE   MSE F 126  (  0.66 40.55)  :       0.13
 23.72  <=>  SE   MSE F 137  (  0.73 42.17)  :       0.06
 22.10  <=>  SE   MSE A 137  (  0.73 45.81)  :       0.13
 21.10  <=>  SE   MSE F 293  (  0.88 70.46)  :       0.27
 18.64  <=>  SE   MSE F 139  (  0.58 47.16)  :       0.32
 16.20  <=>  SE   MSE A 293  (  0.88 93.55)  :       0.43
 14.81  <=>  SE   MSE A 139  (  0.58 53.66)  :       0.26
 11.24  <=>  SE   MSE F   1  (  0.56 72.94)  :       0.19
  7.26  <=>  SE   MSE A   1  (  0.56 92.71)  :       0.49
  4.10  <=>   O   THR A 161  (  1.00 43.14)  :       0.92
  3.81  <=>   CB  THR F 261  (  1.00 65.06)  :       0.58


ATOM Se -0.0623 -0.0435  0.3244    31.23
ATOM Se  0.0630 -0.0264 -0.2195    30.91
ATOM Se -0.0761  0.0141 -0.0840    30.22
ATOM Se  0.0776  0.0031  0.1880    29.08
ATOM Se -0.0028 -0.1375  0.2705    23.72
ATOM Se  0.0042 -0.1241 -0.1671    22.10
ATOM Se -0.0712  0.2201  0.1354    21.10
ATOM Se -0.0261 -0.1020  0.3066    18.64
ATOM Se  0.0787  0.2230 -0.0277    16.20
ATOM Se  0.0204 -0.0827 -0.2023    14.81
ATOM Se -0.3329 -0.1845  0.3639    11.24
ATOM Se  0.3373 -0.1699 -0.2602     7.26
ATOM Se  0.0752 -0.1320  0.2399     4.56


CRYST1   62.827   90.075  191.529  90.00  90.00  90.00 P 21 21 21 
SCALE1      0.015917  0.000000  0.000000        0.00000
SCALE2      0.000000  0.011102  0.000000        0.00000
SCALE3      0.000000  0.000000  0.005221        0.00000
ATOM    182  C   DUM     1      -3.916  -3.917  62.136  1.00 31.23   11
ATOM    136  C   DUM     2       3.955  -2.381 -42.043  1.00 30.91   11
ATOM    313  C   DUM     3      -4.783   1.274 -16.088  1.00 30.22   11
ATOM     24  C   DUM     4       4.875   0.282  36.013  1.00 29.08   11
ATOM    172  C   DUM     5      -0.178 -12.385  51.807  1.00 23.72   11
ATOM    170  C   DUM     6       0.264 -11.178 -32.014  1.00 22.10   11
ATOM    319  C   DUM     7      -4.476  19.827  25.928  1.00 21.10   11
ATOM    173  C   DUM     8      -1.639  -9.191  58.728  1.00 18.64   11
ATOM     33  C   DUM     9       4.943  20.085  -5.303  1.00 16.20   11
ATOM    154  C   DUM    10       1.282  -7.447 -38.744  1.00 14.81   11
ATOM    281  C   DUM    11     -20.916 -16.621  69.699  1.00 11.24   11
ATOM     62  C   DUM    12      21.190 -15.308 -49.836  1.00  7.26   11
ATOM    133  C   DUM    13       4.726 -11.886  45.946  1.00  4.56   11

So we have

Fo-Fo Difference map

If two sets of amplitudes are available, a difference Fourier map can be calculated with something like
%  diff_fourier -m apo.mtz -p apo_refine.mtz -P PH2FOFCWT FOM -m2 inhibitor.mtz -o IsoFourier -pdb apo_refine.pdb -noANO -compare_cut 10.0

 mtz ......................................... apo.mtz
 F ........................................... FP
 SIGF ........................................ SIGFP
 DANO ........................................ 
 SANO ........................................ 

 pmtz ........................................ apo_refine.mtz
 PHI ......................................... PH2FOFCWT
 FOM ......................................... FOM

 pdb ......................................... apo_refine.pdb

 mtz2......................................... inhibitor.mtz
 F2 .......................................... FP
 SIGF2 ....................................... SIGFP

   0 peaks above 20 sigma
   0 peaks above 15 sigma
   0 peaks above 10 sigma
   2 peaks above  8 sigma
   3 peaks above  6 sigma
   5 peaks above  5 sigma
  20 peaks above  4 sigma

-rw-r--r-- 1 vonrhein vonrhein  1846 Oct 10 15:56 IsoFourier.ISO.compare
-rw-r--r-- 1 vonrhein vonrhein  6068 Oct 10 15:56 IsoFourier.ISO.hatom
-rw-r--r-- 1 vonrhein vonrhein 10891 Oct 10 15:56 IsoFourier.ISO.pdb

This will show positive peaks where data in inhibitor.mtz predicts density that is absent in apo.mtz, eg. for an inhibitor:


 Peak         Closest atom in apo_refine.pdb
 [rms]                                             Distance (<= 10.0 )
   9.37  <=>   O   HOH A 501  (  1.00 27.89)  :       1.97
   8.72  <=>   NZ  LYS A  89  (  1.00 43.51)  :       0.87
   6.85  <=>   O   HOH A 505  (  1.00 44.68)  :       2.09
   5.99  <=>   O   HOH A 505  (  1.00 44.68)  :       1.68
   5.48  <=>   O   HOH A 508  (  1.00 41.07)  :       2.34
   4.85  <=>   CB  LYS A  89  (  1.00 30.25)  :       2.54
   4.47  <=>   CG2 ILE A 186  (  1.00 12.12)  :       1.45

If we had already a model of the inhibitor and used that PDB file instead:
%  diff_fourier -m apo.mtz -p apo_refine.mtz -P PH2FOFCWT FOM -m2 inhibitor.mtz -o IsoFourier -pdb inhibitor.pdb -noANO
we would get IsoFourier.ISO.compare:

Peak         Closest atom in inhibitor.pdb
 [rms]                                             Distance (<= 1.0 )
   9.37  <=>   C10 DT4 A1299  (  1.00 38.54)  :       0.32
   8.72  <=>   S1  DT4 A1299  (  1.00 54.82)  :       0.31
   6.85  <=>   N5  DT4 A1299  (  1.00 43.09)  :       0.54
   5.99  <=>   C15 DT4 A1299  (  1.00 47.69)  :       0.81
   5.48  <=>   N7  DT4 A1299  (  1.00 43.13)  :       0.68
   4.85  <=>   CD  LYS A  89  (  1.00 43.87)  :       0.56
   4.32  <=>   NZ  LYS A  33  (  1.00 41.01)  :       0.68
   4.26  <=>   C   PRO A 171  (  1.00 30.06)  :       0.84
   4.02  <=>   C4  DT4 A1299  (  0.75 45.81)  :       0.85
showing us the peaks being very close to the inhibitor.

ana_diffmap_residue - analyse difference map around specific residues

This little tool analyses difference maps around residues in a model. The residues can be either given by the user (as residue name or specified through chain and residue number) or the program will use all non-standard residues within the PDB file.

The output could be useful to get a quick and automatic idea about the amount of difference density features around specific residues (like co-factors, active-site residues or ligands).

A typical usage could be (see also help messages with the "-h" flag):
% ana_diffmap_residue -p refine.pdb -m refine.mtz

fetch_PDB - fetch coordinates and reflection data from local or online PDB archive (and convert reflection data to MTZ format)

This script will fetch the deposited atomic coordinates and reflection data from a local or online PDB archive. The reflection data will be converted into MTZ format (using the CCP4 program http://www.ccp4.ac.uk/dist/html/cif2mtz.html after appropriate checks and clean-ups on the deposited mmCIF file).

A large number of additional checks and analysis are carried out - eg to inform the user about inconsistencies between

If a local copy of the PDB archive is available, the environmental variable BDG_TOOL_LOCALPDBDIR can be set to the full path of this directory (it expects to then find $BDG_TOOL_LOCALPDBDIR/data/structures/all/).

The typical usage for PDB identifier "1ABC" would be:
% fetch_PDB 1ABC
which will create an output directory (1ABC) and report basic statistics for the deposited structure and the resulting MTZ reflection file.

ana_ssbond - check and (optionally) fix for missing SSBOND records in PDB file

In case SSBOND records have been "lost" (e.g. stripped automatically by a program), this tool allows analysis of potentially missing SSBOND entries. If an output file is given, they will also be re-instated.

We are using bond distance (between the two Cys-SG atoms), angle (between Cys1-CB, Cys1-SG and Cys2-SG) and torsion angle (Cys1-CB, Cys1-SG, Cys2-SG and Cys2-CB) as criteria. Based on some data harvesting of deposited PDB structures (refined with BUSTER or REFMAC - since Phenix-refined structures have a different torsion angle distribution with distinct "spikes") we have

distance [Å]1.9202.186
angle [°]91.743117.406
torsion-1 [°]-146.076-31.394
torsion-2 [°]28.665160.277

aB_covalent_ligand - utility to help setting up restraints for covalent linkages (e.g. between protein side-chain and a ligand)

Three components are required to provide BUSTER with all the information to refine e.g. covalently bound ligands using the correct set of restraints:
  1. The correct LINK record in the PDB file.

  2. A linkage restraints dictionary, that provides all necessary restraints surrounding the particular covalent bond (i.e. not only bond, but also angle, torsion and planar restraints).

  3. A description for MakeLINK so that it knows about the linkage type for the two residues described on the LINK card. (We are aware of the way e.g. REFMAC uses the so-called "LINKR" record for a very similar purpose, but since this is a non-standard PDB extension it will not be used within BUSTER.)
It is the users responsibility to provide the correct LINK record in the input PDB file. This can be done by manual editing (but be careful about the fixed formatting requirements) or e.g. in Coot via "Extensions" → "Modelling ..." → "Make Link (click 2 atoms) ...": remember that the directionality (i.e. order in which the two atoms are clicked) matters.

The generation of the remaining files is automated via the aB_covalent_ligand tool:
% aB_covalent_ligand <PDB file>

Example 1:

We look at 4ZZO as an example - starting as if we only had (a) the APO structure, (b) a dataset of the ligand-bound form and (c) the SMILES string for the bound ligand. The required files can be generated via the following commands:

% fetch_PDB 4ZZO | tee 4ZZO.lis
% grep -v "CQ3" 4ZZO/4zzo.pdb > apo.pdb
% ln -s 4ZZO/4zzo-unique.mtz compound.mtz

We now want to refine the "APO" structure against the compound dataset, e.g. using:

% refine -p apo.pdb -m compound.mtz -RB -d buster.01 | tee buster.01.lis

Now we can generate a CIF dictionary for the compound and fit it into the difference density of the above refinement:

% grade 'CCC(=O)Nc1ccccc1Nc1nc(NC2CCOCC2)ncc1Cl' -resname CQ3 | tee grade_CQ3.lis
% rhofit -l grade-CQ3.cif -m buster.01/refine.mtz -p buster.01/refine.pdb -d rhofit | tee rhofit.lis

If there is no access to a local installation of the Cambridge Structural Database (CSD), the dictionary can also be generated on the Grade Web Server.

The results from the Rhofit run can be visualised by:

% cd rhofit
% visualise-rhofit-coot

Here one can

The final step is to create the correct LINK record via "Extensions" → "Modelling ..." → "Make Link (click 2 atoms) ...": remember that the directionality (i.e. order in which the two atoms are clicked) matters. This should result in a line like the following:
LINK         C   CQ3 A4000                 SG  CYS A 166     1555   1555        
Finally, we save the model ("File" → "Save Coordinates ...") as "merged_link.pdb". An alternative is to use:

% cd rhofit
% coot --pdb merged.pdb --auto refine.mtz --dictionary best.cif

and work on the already combined model (top hit and APO part).

Everything up to this part is just preparation to have the correct PDB file as described above: ligand fitted in appropriate pose and correct LINK record. There is no reason why this couldn't be done any other way: other than the final PDB file, nothing else is required for the actual step of creating a linkage restraints dictionary and the auxilliary files for refinement in BUSTER.

We can now run:

% aB_covalent_ligand merged_link.pdb | tee aB_covalent_ligand.lis

which will report at the end:

  For MakeLINK = CQ3-C_CYS-SG.dat
  For BUSTER   = CQ3-C_CYS-SG.dic

  Run with e.g.

    refine -p merged_link.pdb \
           -m some.mtz \
           -l grade-LIG.cif \
           MakeLINK_LinkagesFile=CQ3-C_CYS-SG.dat \
           -l CQ3-C_CYS-SG.dic \
            RunBusterDuplicatesOverride=CQC \

      - or -

    refine -p merged_link.pdb \
           -m some.mtz \
           -l grade-LIG.cif \
           -M /some/where/rhofit/CQ3-C_CYS-SG.macro \

This means we can now run a BUSTER refinement of the covalently bound ligand using:

% refine -p rhofit/merged_link.pdb -m compound.mtz \
   -l grade-CQ3.cif -M /some/where/rhofit/CQ3-C_CYS-SG.macro \
   -d buster.02 | tee buster.02.lis

(of course adapting the actual command-line according to the model parametrisation - like NCS, TLS, occupancy refinement, additional restraints etc). In this particular example, the R/Rfree values go from 0.1966/0.2524 (initial APO refinement) to 0.1869/0.2411 after final refinement of the protein-ligand complex with the ligand covalently bound.


Last modification: 04.02.2020