BUSTER Documentation	previous next
File formats

BUSTER Documentation : File formats

Copyright © 2003-2018 by Global Phasing Limited

All rights reserved.

This software is proprietary to and embodies the confidential
technology of Global Phasing Limited (GPhL). Possession,
use, duplication or dissemination of the software is authorised
only pursuant to a valid written licence from GPhL.

Contact buster-develop@GlobalPhasing.com

PDB
MTZ
Rigid-body description
NCS description
TLS description
TNT sequence file
Geometry restraint dictionaries

PDB

A great deal of checking is done by the program pdbchk, which is used as part of each BUSTER job. Some of the problems and inconsistencies found in the starting PDB file can also be corrected at that step. These include e.g.

each atom should have a chain identifier (e.g A, B and C for protein chains and W for water)
a correct CRYST1 card is required (see PDB format guide), especially the space group symbol.
although not enforced by the PDB standard, it seems sensible to use letters (A, B, C etc) in column 17 of ATOM/HETATM records to denote alternate conformations and numbers (1, 2, 3 etc) in column 27 of ATOM/HETATM records to denote insertion code.
standard SSBOND and LINK records are supported and often required (to describe correctly the molecular connectivity).

BUSTER internally uses atom/residue nomenclature PDB v3.

MTZ

Reflection data is given in CCP4 MTZ format (binary file format):

normal MTZ file with F/SIGF columns (any column name is possible, but the column types have to be F/Q - which they nearly always are anyway, unless something went really wrong)

the cell parameters for the refinement are taken from the MTZ file header (please note that it does not yet handle different cell entries for different datasets). The assumption is that the MTZ file usually contains only a single dataset.

if the MTZ file contains a set of columns with Hendrickson-Lattmann coefficients (usually named HLA, HLB, HLC and HLD) these can be used as additional, external phase information (unless the MTZ file is actually the output of a previous BUSTER run - which would not be a good idea). The user needs to set the parameter autoBUSTER_hls to the four column names, e.g. with 'refine autoBUSTER_hls="HLA HLB HLC HLD" ... '.

Rigid-body description

The rigid-body description file is used with the -RB command line argument, e.g.

% refine -p some.pdb -m other.mtz -RB rigid.dat
These files describe the rigid bodies to be used for the initial big cycle(s) of rigid-body refinement that is done if -RB rigid.dat is specified. After this first big cycle of rigid-body refinement, normal (xyz and B) refinement is done for all subsequent cycles.
Note that -RB without a file being specified will define a single rigid body for every chain in the input pdb file. This is often a sensible initial approach.

The rigid-body file uses gelly combine syntax. E.g.:

NOTE BUSTER_COMBINE XYZ { A|5 - A|73 A|150 - A|170 }
NOTE BUSTER_COMBINE XYZ { A|74 }
NOTE BUSTER_COMBINE XYZ { A|75 }
NOTE BUSTER_COMBINE XYZ { A|76 }
NOTE BUSTER_COMBINE XYZ { A|77 - A|120 }

This sets up two large rigid bodies for two domains. The first domain contains residues 5 to 73 and 150 to 170. The second domain goes from residue 77 to 120. The three residues in between (the linker) are treated as individual rigid bodies. This can be sensible because bonded interactions remain fully active throughout rigid-body refinement using BUSTER - only non-bonded contacts are being zero weighted in rigid-body refinement cycles. So to allow the domains to move more freely, the linker residues are kept individually rigid. A good alternative would be to simply delete a single residue in the linker to remove any bonded connection between the domains.

You can use several -RB arguments as in

% refine -RB rigid1.dat -RB rigid2.dat ...

Here the first two big cycles will be rigid-body refinement cycles - with the rigid-body parameters rigid1.dat for the first big cycle and rigid2.dat for the next big cycle. From big cycle 3 onwards, no rigid-body restraints will be used.
If you want to restrict the resolution range in a particular rigid-body refinement cycle, then this can be done by adding a special RESOLUTION card to a rigid-body definition file. Just add a line (starting with a hash) to the beginning of the file:

# RESOLUTION <low res> <high res>

In this case, only reflections within the specified resolution range will be used during that particular rigid-body refinement cycle. As an example: to use only data to 4 Å in a two chain rigid-body refinement step:

# RESOLUTION 50.0 4.0
NOTE BUSTER_COMBINE XYZ { A|* }
NOTE BUSTER_COMBINE XYZ { B|* }

Using only low resolution data during a rigid-body refinement cycle can help increasing the radius of convergence.
For further discussion as to the use of rigid-body refinement, see the Rigid-body usage section.

NCS

There are several ways of specifying NCS:

By far the easiest option is to just use the -autoncs command-line flag (for LSSR-type NCS restraints). This will setup all NCS relations and also perform some initial pruning (of residues that are significantly different). If that last feature is not wanted, using -autoncs_noprune could be used.
It might be useful to also add the -sim_swap_equiv flag: this will try and improve the NCS relations by rotating symmetrical amino-acid sidechains (ASP, GLU, TYR, PHE) - since inconsistent atom-naming might otherwise impact on the NCS relation for LSSR.
Please note that the correct NCS relation might be disrupted if some automatic nomenclature correction is performed on the final refinement output file, e.g. if loading the model into Coot (where it might be useful to use "(set-nomenclature-errors-on-read ignore)" in your Coot preferences).
The second way is to use GELLY syntax for NCS-specification.

The old option for superposition-based NCS restraints:

uses normal TNT style syntax for describing NCS restraints.

a simple example would look like this:

CLUSTER N1 RESIDUE 1 - 20 \ RESIDUE 22 - 79 CHAINS A B CLUSTER N2 RESIDUE 80 - 101 CHAINS A B

This describes a two-domain protein (N1 and N2) which crystallises with 2 molecules (chains A and B) in the asymmetric unit. Residue 21 in the first domain (N1) has been taken out of the NCS relation (maybe due to a different crystal contact).

TLS description

The default behaviour of the TLS refinement options in BUSTER is to read existing TLS group definitions from the PDB file header, if present. Failing that, a single TLS group will be defined per macro-molecular chain. Please refer to TLS refinement for more information on how to set up simple TLS refinement.

For more complex TLS parameterisation, it is possible to specify custom TLS group definitions in a GELLY syntax file given as an argument to the -TLS command.

There are several cards that describe a TLS group. They fall into three groups listed below. All of them use a unique tag to specify a particular TLS group.

Specification of the content of a group:

NOTE BUSTER_TLS_SET <tag> <spec>

This card is mandatory for TLS-refinement.

The specification <spec> can be either a single selection using 'curly-braces', eg.

NOTE BUSTER_TLS_SET tls1 { A|1 - A|150 A|201 - A|360 }

or a single set specified using the NOTE BUSTER_SET syntax, eg.

NOTE BUSTER_SET group1 = { A|1 - A|150 }
NOTE BUSTER_SET group2 = { A|201 - A|360 }
NOTE BUSTER_SET set1 = group1 + group2
NOTE BUSTER_TLS_SET tls1 set1

The values of any known parameters of a given TLS group, either the origin or the unique values of the T-, L-, and, S-tensors, are specified as follows:

NOTE BUSTER_TLS_O <tag> <X> <Y> <Z>
NOTE BUSTER_TLS_T <tag> <T11> <T22> <T33> <T12> <T13> <T23>
NOTE BUSTER_TLS_L <tag> <L11> <L22> <L33> <L12> <L13> <L23>
NOTE BUSTER_TLS_S <tag> <S2211> <S1133> <S12> <S13> <S23> <S21> <S31> <S32>

NOTE: tag must be the same as in the NOTE BUSTER_TLS_SET card

These cards are not mandatory. If no origin has been specified, the centroid of the atoms in the group is used. Similarly, if the T, L, and, S parameters are unspecified the values are set to zero. The element <S2211> is <S22> - <S11>, while <S1133> is <S11> - <S33>.

The values must be given in the TNT-Cartesian system and the units are Å, Å², °², and, Å°, respectively.

The following card will determine whether to keep the TLS parameters fixed or to refine them:

NOTE BUSTER_TLS_FIX <tag> (RB|ALL)

A value of RB specifies that the parameters associated with the Rigid-Body part of a TLS group are kept fixed, ie. the location and the relative orientation (this is the default). A value of ALL completely fixes the TLS group.
Switching the refinement of TLS-parameters on or off at different big cycles of an BUSTER run, is controlled by the variables: TLSfixcycRB and TLSfixcycALL.

Example: these cards would specify two TLS groups that are to be refined with fixed translational/rotational parts:

NOTE BUSTER_TLS_SET tlsA { A|* } NOTE BUSTER_TLS_T tlsA -0.05 -0.11 -0.15 -0.01 0.03 0.02 NOTE BUSTER_TLS_L tlsA 2.88 1.70 1.17 -0.41 0.32 -0.35 NOTE BUSTER_TLS_S tlsA -0.11 0.02 -0.10 -0.09 0.04 0.01 0.01 -0.01 NOTE BUSTER_TLS_O tlsA 6.42 3.54 15.71 NOTE BUSTER_TLS_FIX tlsA RB NOTE BUSTER_TLS_SET tlsB { B|* } NOTE BUSTER_TLS_T tlsB -0.01 -0.03 -0.03 0.00 0.00 0.02 NOTE BUSTER_TLS_L tlsB 0.38 0.45 0.58 0.04 -0.04 -0.02 NOTE BUSTER_TLS_S tlsb -0.02 -0.02 -0.01 0.01 -0.02 -0.02 0.02 -0.01 NOTE BUSTER_TLS_O tlsB -4.40 28.29 43.24 NOTE BUSTER_TLS_FIX tlsB RB

The pdb2tls tool provided, can be used as an easy way of generating a TLS definition file - especially when applied to a PDB file already refined with TLS (which then should contain a REMARK 3 section with TLS details). The resulting file should be a good example to understand the format used within BUSTER.

TNT sequence file

The TNT sequence file describes the connectivity between residues and atoms in the PDB file. Every residue in the PDB file must be described in the TNT sequence file, though it is permitted for the TNT sequence file to describe residues or atoms which are missing in the coordinate file – you can keep the residue type the same even if a sidechain is truncated. If you have very large missing sections in your input model, you can generate a sequence file from a FASTA or PIR ASCII sequence using seq2seq.

By default, the sequence file is generated automatically from the input model using the MakeLINK utility. MakeLINK is aware of a number of common covalently-bound cofactors and glycosylation patterns; if you have more complicated linkages in your protein, you have two choices.

You can produce the sequence file manually, edit it by hand, and submit it with the -Seq option
You can incorporate MakeLINK directives in a TNT-format dictionary passed to refine with the -l option, and BUSTER will arrange to pass these to MakeLINK. See GradeCovalentTutorial on the BUSTER wiki for an example.

If your input model contains accidental contacts between protein regions from different parts of the sequence (this is something we have seen for output from Buccaneer or from mediocre molecular-replacement output) then MakeLINK may introduce incorrect cross-links, which will tend to be reported as sanity-check failures from BUSTER. In these cases you can run with SequenceFileGeneration=pdb2seq to use a different sequence file generation method; note that this method is unaware of covalent linkages other than that present in protein (peptides) or DNA/RNA.

Geometry restraints and standard libraries

BUSTER needs to be given information about the geometry of the ligands in your file. This should be made available as a refmac-compatible .cif file, as produced by many dictionary-generation programs, including grade which is part of the BUSTER distribution.

If you do not give a dictionary and BUSTER does not have one available internally, you will get an error message from refine telling you for which three-letter codes dictionaries are needed.

Dictionaries for ligands which are known to the PDB can be made very easily using the grade_PDB_ligand tool; you need to have obabel on your path, and you will get very much better results if you have the CSD tool mogul on your path. You must use the -nomogul option to grade_PDB_ligand if you don't have mogul. Alternatively, the Grade webserver can be used, circumventing the need for installing all required 3rd party software.

BUSTER contains a library of restraint dictionaries for fifty or so of the most common residues, mostly generated with the grade_PDB_ligand tool mentioned above, but with some tweaks applied by hand. Giving a dictionary for the residue using the -l option will override the one in the library, though we would appreciate reports if you have ever had to do this because the dictionary in the library does not work correctly.

Most tools and libraries will also support hydrogens placed at explicit electron-cloud distance (instead of nuclear position). grade, grade_PDB_ligand and aB_hydrogenate provide the -ecloud flag to switch to this behaviour. For BUSTER itself, the -M Ecloud macro will provide "refine" with all necessary information.

It is at present still possible to use the legacy TNT format for dictionaries, and indeed the protein and sugar restraint libraries are currently distributed in this format. We would not recommend that this format be used for any new work, though it is still necessary for accessing certain features.

Last modification: 04.02.2020

Copyright	© 2003-2018 by Global Phasing Limited

	All rights reserved.

	This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.

Contact	buster-develop@GlobalPhasing.com

BUSTER Documentation : File formats

Contents

PDB

MTZ

Rigid-body description

NCS

TLS description

TNT sequence file

Geometry restraints and standard libraries