BUSTER Documentation previous next
File formats

BUSTER Documentation : File formats

Copyright    © 2003-2018 by Global Phasing Limited
 
  All rights reserved.
 
  This software is proprietary to and embodies the confidential
technology of Global Phasing Limited (GPhL). Possession,
use, duplication or dissemination of the software is authorised
only pursuant to a valid written licence from GPhL.
 
Contact buster-develop@GlobalPhasing.com


Contents


PDB

A great deal of checking is done by the program pdbchk, which is used as part of each BUSTER job. Some of the problems and inconsistencies found in the starting PDB file can also be corrected at that step. These include e.g.
  1. each atom should have a chain identifier (e.g A, B and C for protein chains and W for water)

  2. a correct CRYST1 card is required (see PDB format guide), especially the space group symbol.

  3. although not enforced by the PDB standard, it seems sensible to use letters (A, B, C etc) in column 17 of ATOM/HETATM records to denote alternate conformations and numbers (1, 2, 3 etc) in column 27 of ATOM/HETATM records to denote insertion code.
  4. standard SSBOND and LINK records are supported and often required (to describe correctly the molecular connectivity).
BUSTER internally uses atom/residue nomenclature PDB v3.


MTZ

Reflection data is given in CCP4 MTZ format (binary file format):
  1. normal MTZ file with F/SIGF columns (any column name is possible, but the column types have to be F/Q - which they nearly always are anyway, unless something went really wrong)
  2. the cell parameters for the refinement are taken from the MTZ file header (please note that it does not yet handle different cell entries for different datasets). The assumption is that the MTZ file usually contains only a single dataset.
  3. if the MTZ file contains a set of columns with Hendrickson-Lattmann coefficients (usually named HLA, HLB, HLC and HLD) these can be used as additional, external phase information (unless the MTZ file is actually the output of a previous BUSTER run - which would not be a good idea). The user needs to set the parameter autoBUSTER_hls to the four column names, e.g. with 'refine autoBUSTER_hls="HLA HLB HLC HLD" ... '.


Rigid-body description


NCS

There are several ways of specifying NCS:
  1. By far the easiest option is to just use the -autoncs command-line flag (for LSSR-type NCS restraints). This will setup all NCS relations and also perform some initial pruning (of residues that are significantly different). If that last feature is not wanted, using -autoncs_noprune could be used.

    It might be useful to also add the -sim_swap_equiv flag: this will try and improve the NCS relations by rotating symmetrical amino-acid sidechains (ASP, GLU, TYR, PHE) - since inconsistent atom-naming might otherwise impact on the NCS relation for LSSR.

    Please note that the correct NCS relation might be disrupted if some automatic nomenclature correction is performed on the final refinement output file, e.g. if loading the model into Coot (where it might be useful to use "(set-nomenclature-errors-on-read ignore)" in your Coot preferences).

  2. The second way is to use GELLY syntax for NCS-specification.

  3. The old option for superposition-based NCS restraints:


TLS description

The default behaviour of the TLS refinement options in BUSTER is to read existing TLS group definitions from the PDB file header, if present. Failing that, a single TLS group will be defined per macro-molecular chain. Please refer to TLS refinement for more information on how to set up simple TLS refinement.

For more complex TLS parameterisation, it is possible to specify custom TLS group definitions in a GELLY syntax file given as an argument to the -TLS command.

There are several cards that describe a TLS group. They fall into three groups listed below. All of them use a unique tag to specify a particular TLS group.

  1. Specification of the content of a group:

    NOTE BUSTER_TLS_SET <tag> <spec>

    This card is mandatory for TLS-refinement.

    The specification <spec> can be either a single selection using 'curly-braces', eg.

     
    NOTE BUSTER_TLS_SET tls1 { A|1 - A|150 A|201 - A|360 }

    or a single set specified using the NOTE BUSTER_SET syntax, eg.

     
    NOTE BUSTER_SET group1 = { A|1 - A|150 }
    NOTE BUSTER_SET group2 = { A|201 - A|360 }
    NOTE BUSTER_SET set1 = group1 + group2 
    NOTE BUSTER_TLS_SET tls1 set1

  2. The values of any known parameters of a given TLS group, either the origin or the unique values of the T-, L-, and, S-tensors, are specified as follows:

    NOTE BUSTER_TLS_O <tag> <X> <Y> <Z>
    NOTE BUSTER_TLS_T <tag> <T11> <T22> <T33> <T12> <T13> <T23>
    NOTE BUSTER_TLS_L <tag> <L11> <L22> <L33> <L12> <L13> <L23>
    NOTE BUSTER_TLS_S <tag> <S2211> <S1133> <S12> <S13> <S23> <S21> <S31> <S32>

    NOTE: tag must be the same as in the NOTE BUSTER_TLS_SET card

    These cards are not mandatory. If no origin has been specified, the centroid of the atoms in the group is used. Similarly, if the T, L, and, S parameters are unspecified the values are set to zero. The element <S2211> is <S22> - <S11>, while <S1133> is <S11> - <S33>.

    The values must be given in the TNT-Cartesian system and the units are Å, Å2, °2, and, Å°, respectively.

  3. The following card will determine whether to keep the TLS parameters fixed or to refine them:

    NOTE BUSTER_TLS_FIX <tag> (RB|ALL)

    A value of RB specifies that the parameters associated with the Rigid-Body part of a TLS group are kept fixed, ie. the location and the relative orientation (this is the default). A value of ALL completely fixes the TLS group.

    Switching the refinement of TLS-parameters on or off at different big cycles of an BUSTER run, is controlled by the variables: TLSfixcycRB and TLSfixcycALL.

Example: these cards would specify two TLS groups that are to be refined with fixed translational/rotational parts:

NOTE BUSTER_TLS_SET tlsA  { A|* }
NOTE BUSTER_TLS_T   tlsA  -0.05 -0.11 -0.15 -0.01  0.03  0.02
NOTE BUSTER_TLS_L   tlsA   2.88  1.70  1.17 -0.41  0.32 -0.35
NOTE BUSTER_TLS_S   tlsA  -0.11  0.02 -0.10 -0.09  0.04  0.01  0.01 -0.01
NOTE BUSTER_TLS_O   tlsA   6.42  3.54 15.71
NOTE BUSTER_TLS_FIX tlsA  RB

NOTE BUSTER_TLS_SET tlsB  { B|* }
NOTE BUSTER_TLS_T   tlsB  -0.01 -0.03 -0.03  0.00  0.00  0.02
NOTE BUSTER_TLS_L   tlsB   0.38  0.45  0.58  0.04 -0.04 -0.02
NOTE BUSTER_TLS_S   tlsb  -0.02 -0.02 -0.01  0.01 -0.02 -0.02  0.02 -0.01
NOTE BUSTER_TLS_O   tlsB  -4.40 28.29 43.24
NOTE BUSTER_TLS_FIX tlsB  RB

The pdb2tls tool provided, can be used as an easy way of generating a TLS definition file - especially when applied to a PDB file already refined with TLS (which then should contain a REMARK 3 section with TLS details). The resulting file should be a good example to understand the format used within BUSTER.


TNT sequence file

The TNT sequence file describes the connectivity between residues and atoms in the PDB file. Every residue in the PDB file must be described in the TNT sequence file, though it is permitted for the TNT sequence file to describe residues or atoms which are missing in the coordinate file – you can keep the residue type the same even if a sidechain is truncated. If you have very large missing sections in your input model, you can generate a sequence file from a FASTA or PIR ASCII sequence using seq2seq.

By default, the sequence file is generated automatically from the input model using the MakeLINK utility. MakeLINK is aware of a number of common covalently-bound cofactors and glycosylation patterns; if you have more complicated linkages in your protein, you have two choices.

  1. You can produce the sequence file manually, edit it by hand, and submit it with the -Seq option
  2. You can incorporate MakeLINK directives in a TNT-format dictionary passed to refine with the -l option, and BUSTER will arrange to pass these to MakeLINK. See GradeCovalentTutorial on the BUSTER wiki for an example.

If your input model contains accidental contacts between protein regions from different parts of the sequence (this is something we have seen for output from Buccaneer or from mediocre molecular-replacement output) then MakeLINK may introduce incorrect cross-links, which will tend to be reported as sanity-check failures from BUSTER. In these cases you can run with SequenceFileGeneration=pdb2seq to use a different sequence file generation method; note that this method is unaware of covalent linkages other than that present in protein (peptides) or DNA/RNA.


Geometry restraints and standard libraries

BUSTER needs to be given information about the geometry of the ligands in your file. This should be made available as a refmac-compatible .cif file, as produced by many dictionary-generation programs, including grade which is part of the BUSTER distribution.

If you do not give a dictionary and BUSTER does not have one available internally, you will get an error message from refine telling you for which three-letter codes dictionaries are needed.

Dictionaries for ligands which are known to the PDB can be made very easily using the grade_PDB_ligand tool; you need to have obabel on your path, and you will get very much better results if you have the CSD tool mogul on your path. You must use the -nomogul option to grade_PDB_ligand if you don't have mogul. Alternatively, the Grade webserver can be used, circumventing the need for installing all required 3rd party software.

BUSTER contains a library of restraint dictionaries for fifty or so of the most common residues, mostly generated with the grade_PDB_ligand tool mentioned above, but with some tweaks applied by hand. Giving a dictionary for the residue using the -l option will override the one in the library, though we would appreciate reports if you have ever had to do this because the dictionary in the library does not work correctly.

Most tools and libraries will also support hydrogens placed at explicit electron-cloud distance (instead of nuclear position). grade, grade_PDB_ligand and aB_hydrogenate provide the -ecloud flag to switch to this behaviour. For BUSTER itself, the -M Ecloud macro will provide "refine" with all necessary information.

It is at present still possible to use the legacy TNT format for dictionaries, and indeed the protein and sugar restraint libraries are currently distributed in this format. We would not recommend that this format be used for any new work, though it is still necessary for accessing certain features.


Last modification: 04.02.2020