BUSTER User Manual
Chapter 4

Preparing the BUSTER input

Copyright © 1995-2002 by Gérard Bricogne and the Buster Development Group. All rights reserved.  

0. General information about preparing input for BUSTER

If it is the first time you are using BUSTER on these data, filling in this form will require some time and concentration. We will now have a closer look at the ingredients of a typical calculation in order to consolidate the terminology and to describe in more practical terms what goes on in the programs.

This information will be presented in the guise of instructions on how to prepare the input form to BUSTER through the Graphical User Interface BUFFET. The graphical user interface is needed in order to impose the right hierarchical order and guide the user with on-line help in the form of hyperlinks. In this way, you will avoid most of the logical inconsistencies and typing errors usually associated with free-format ASCII submission.

Recall the main ingredients which specify the current "model":

  1. a PDB file describing the current partial structure, for which we will use the symbol PDBFRG;
  2. the volume fraction, mean electron density and temperature factor for the solvent region.
  3. [OPTIONAL, the definition of NCS operators in TNT format.]
  4. If you decide to declare the presence of some missing atoms , you need to specify their distribution and chemical composition:
    • [OPTIONAL, either of:
      • a PDB file specifying the envelope for the whole molecule (fragment + missing atoms), for which we will use the symbol PDBNUP;
      • a CCP4 MAP file to compute the envelope for the whole molecule (fragment + missing atoms), for which we will use the symbol MAPNUP;]
    • [OPTIONAL,the chemical composition of the pool of missing atoms, which may in some cases be conveniently specified by a PDB file for which we will use the symbol REST, containing a list of atoms whose number and chemical identities give the correct chemical composition for the pool of missing atoms but whose coordinates are too unreliable to be used as such]

Besides these ingredients specifying the model, we have an mtz file containing observed amplitudes, their standard deviations, optionally some Hendrickson-Lattman coefficients conveying experimental phase information, and finally aninteger flag value for use in cross-validation. If you compute the envelope for the missing atoms from a map, it is also useful to have figure-of-merit (FOM) in the mtz file.  

Beware ! All MTZ and PDB files used for BUSTER need to be present in the $BDG_datafiles directory before the input form script is called !


1. Field-by-field description of the input page

Job ID

The Job ID string will be used in generating a name for the BUSTER job and a name for the subdirectory of the logfiles directory in which all relevant files and output will be placed.

A given crystallographic problem will need several runs of refinement/completion, possibly some trial-and-error model updates, etc. Our convention is to give the same name (the stem) to all these runs, and increment an ordinal number as an extension. The stem can be modified in the field following "JobID". For example, if you choose MissingLoop as your project ID, your successive runs of the MissingLoop project will be named MissingLoop.1 , MissingLoop.2 etc.  


Title and Comments

Title

The title (limited to one line only) is there to remind you what is specific to the calculation you are currently undertaking. Keep in mind that the graphical interface makes it very easy to generate many different jobs. You will need the title to sort out which is which.

For these reasons, the first few words of the title will appear in the popups alongside with the JobID. The title will also appear in the LIST.html logfile, and also beside the <jobID>.<run number>.crd field in the list of previous runs from which new runs can be prepared through the Prepare run button.

Comments

A box of free-format text is provided for detailed annotations pertaining to this job. It is recommended to make use of this feature to document the various BUSTER jobs, as the details may quickly fade out of memory with time.


Diffraction data

Mtz Datafile

Select the mtz file containing your data. Remember (see Preparing MTZ files for BUSTER) that all the measurement info has to be included in a multi-column MTZ file. If the file is present in the datafiles directory, if it is readable by the 'buster' account, and if it has the right extension (.mtz), it will be listed in this menu.

Compulsory columns are: structure factor amplitudes, their sigmas and FreeR_flag. Optionnal columns are: Hendrickson-Lattman coefficients. The labels present in the MTZ-file can be chosen from the input form.

Resolution limits

These are used for accepting reflexions from the input mtzfile. Any reflexion outside the resolution limits input at this stage will not be used in any of the subsequent calculations.

MTZ Label for F Select the label for the column containing the observed F's. Make sure that it is indeed present in the MTZ file. Default value: FP
MTZ Label for SIGF Select the label for the column containing the observed standard deviation for the F's. Has to be present in the MTZ file. Default value: SIGFP
MTZ Label for FreeR_flag Select the label for the column containing the observed standard deviation for the F's. Has to be present in the MTZ file. Default value: FreeR_flag
MTZ Label for HL A coefficient Select the label for the column containing the Hendrickson-Lattmann A coefficient for external phase information (if any).  Optional Default value: HLA
MTZ Label for HL B coefficient Select the label for the column containing the Hendrickson-Lattmann B coefficient for external phase information (if any).  Optional Default value: HLB
MTZ Label for HL C coefficient Select the label for the column containing the Hendrickson-Lattmann C coefficient for external phase information (if any).  Optional Default value: HLC
MTZ Label for HL D coefficient Select the label for the column containing the Hendrickson-Lattmann D coefficient for external phase information (if any).  Optional Default value: HLD

FreeR_flag for the test set

Set the FreeR_flag of the subset of reflexions to be used as a test set for cross validation. Only the integer values present in the FreeR_flag column of the mtzfile are valid options. The default value is FreeR_flag=0.

If your starting point is a structure refined with CNS, you are likely to have set FreeR_flag=1. The sftools call issued at the outset will perform the change of FreeR_flag in the working MTZ file, so that the CCP4 standard FreeR_flag=0 is attributed to your free set. Just watch out for the numbers of free-/working-set reflexions in the LIST.html file, to confirm that it is correct.


Structural model

This section of the input defines the structure in terms of three components:

  1. The partial structure, or `fragment': this is the part of structure for which there is a PDB model and that can be subjected to refinement with TNT;
  2. The missing structure: this is the part of structure for which an atomic model does not exist yet. The scattering from this part is modelled in BUSTER with real-space low-resolution distributions.
  3. The bulk solvent.

  1. Partial structure
  2. Fragment file

    Select the PDB file specifying the partial structure which is to be subjected to ML refinement (see Chapter 2,  Task 1), or is to be kept fixed but used as a source of phase information in a ME completion calculation (see Chapter 2,  Task 2).

    This file must reside in your datafiles directory and its name must end with .pdbfrg or simply .pdb in order to be listed in the scrolling menu for selection.

    Sequence file

    Sequence of the refined fragment in TNT format. This file can be produced during data preparation by the script: $BDG_home/bin/buster/pirToSeq.pl. The script relies on the presence of a perl version 5 in /bin/perl.

    The script is only a preliminary attempt at easing the task of setting up the TNT sequence file. It reads an ASCII sequence file in pir sequence format (pir format in a nutshell: the first line begins with >, the second line can contain anything; the one-letter code sequence follows, terminated by an asterisk).

    The script can be executed from the unix shell:

    $BDG_home/bin/buster/pirToSeq.pl <file.pir> <file.seq>
    The script will ask you to specify the number of copies of the molecule in the a.u., and the chain labels you want to give to each molecule.

    The output file file.seq is a basic TNT sequence file; you will need to inspect it visually and possibly edit it, for example when:

    • ligands are bonded to your macromolecule: you will need to define the link and add the RESIDUE card for the ligand;
    • there is more than one molecule of different sequence in the a.u., for example heterodimers or AB complexes with A and B different: you will need to run the script for each molecule, and paste the sequence files together;
    • you have disulphide bonds: in which case you will need to add the SS link to the appropriate Cysteine RESIDUE cards.

    Don't forget to move the file.seq to the datafiles directory.

    Running pirToSeq.pl with the -s flag allows you to give the starting residue a number different from 1 - this is very useful if your PDB and the 1-letter sequence have a different numbering, eg. if the N-terminal residue in your PDB has sequence number -2, you go:

    pirToSeq.pl -s -2 protein.pir protein.seq

    If you want to be able to add many molecules, e.g. solvent ones, without the need for updating the sequence file every time you add one, you can add the description of your molecule in ASSUME cards in the file $BDG_tntdata/assume.dat.

  3. Missing atoms structure
  4. BUSTER implements two alternative techniques to calculate the prior distribution for the missing atoms: choose one and only one of the following:

    • Uniform prior: the missing atoms are supposed to be uniformly distributed in all the volume not occupied by the partial structure.
      This is the preferred regime in case of refinement of almost complete structure.

    • Prior based on the current phases: a first BUSTER job is run to perform scaling with a uniform prior (as in 1. above); then a second calculation follows, with a missing atoms envelope computed from the BUSTER electron density map just computed.
      This is usally selected when a significant amount of scattering matter is still absent from the fragment.

    Missing atoms

    The number of residues is used to define the chemical composition of the pool of "missing atoms" in the asymmetric unit. The average residue contains 4.869 carbons, 1.351 nitrogens, 1.492 oxygen and 0.051 sulfur (taken from Susan Harrell's observed frequencies table).

    The average thermal composition needs to be input by the user as well. This value should be between 1 and 1.5 times the average B-factor of the fragment.

  5. Bulk solvent model
  6. The bulk solvent contribution is specified by its solvent contents. The value obtained after optimisation of the solvent contents in SHARP is a good guess. Otherwise, it can be estimated from Bernhard Rupp's applet server.


TNT Maximum Likelihood Refinement

During refinement, BUSTER interacts with TNT (Tronrud, 1997), which provides for the stereochemical restraints, the minimiser and optionally handle Non-Crystallographic Symmetry (NCS). Some parameters are required to drive TNT modules, and are defined in this last section of the input. Additional information can be found in the TNT manual and TNT guide and in Chapter 3 of this manual

Refinement parameters are grouped in several sections: the minimisation, the weights, the structure definition files, the optional NCS and the databases for stereochemical restraints.

Number of cycles

Set the number of cycles of ML structure refinement to <value>. Default is value=51, usually enough for a test of good behaviour but rarely sufficient for reaching convergence.

B factor refinement

Three choices are given to the user. Individual B factors can either be refined, or fixed, or controlled by the additional commands to the TNT minimiser module SHIFT. This latter option is provided to give a finer control of the refinement of B factors to the user: parts of the structure can be refined with individual B factors, whereas other parts maintain their B factors fixed.

Additional commands for the minimiser

The user can input lines to set parameters for the TNT minimiser SHIFT. The user is referred to the relevant pages of the TNT manual. Some useful commands that can be inserted in this window are:

Weights

Sets the weight of the X-ray residual (here, the log-likelihood gain). For the X-ray residual term, the default value is 3.0. Note that this weight is not at all on the same scale as that for least-squares refinement. As a guide, a weight of around a few units (say, 2 to 10) proves to be optimal, but this depends a lot on resolution, on the incompleteness of the partial structure, and on whether external phase information is available or not. The automatic adjustment of this weight has been planned but is not yet available. N.B. A lower value of the X-ray weight corresponds to tighter geometrical restraints.

NCS

NCS can be imposed as a constraint (hard NCS) or as a restraint (soft NCS); when restrained, a weight must be applied, roughly equal to the inverse of the square of the expected rms deviation between NCS-related copies. These two complementary approaches of handling on NCS are not mutually exclusive: they can be used simultanously in the refinement (for example using constraints bewteen reasonably identical domains, but leaving more flexibility to the hinge region between domains, thus applying only restrained NCS to this latter region).

For soft NCS, only the part of the structure to be restrained by NCS must be included, in the form of a TNT file bearing the extension .ncssoft.

Constrained (hard) NCS files (of extension .ncshard) must contain the definition of the region to be constrained, but also the definition of the NCS operators in TNT format, in in TNT reference frame. Note that the TNT program ncs will produce those operators.

Stereochemical restraints libraries

By default, the Engh and Huber (Engh and Huber, 1991) database, (csdx_protgeo.dat) and the TNT database for non-bonded contacts (contact.dat) and restraints on B-factors (bcorrel.dat) are  listed. Restraints for nucleic acids (nuclgeo.dat) or co-factors can be added when needed. Ligand stereochemical restraints can be created using the preliminary script $BDG_home/bin/buster/createTNTDict.sh.

Restraints databases can be fetched from the BDG distribution of TNT in $BDG_tntdata, or from the user's data files $BDG_datafiles.


Maximum Entropy Completion

The Maximum Entropy Completion will create details within the broad envelope defined by the prior. It is useful to complete partial model, and should only be checked when the residual maps on and around the partial structure are reasonably devoid of features, i.e. the partial structure model is reasonably good..

Submitting the calculation

Once all the required information has been entered, you choose to Submit the calculation, or just to save the contents of the form (card files); in this case, you can submit the calculation using the Restart option from the main control panel. On the contrary, upon submission a hyperlink will be displayed, leading you straight to the newly created subdirectory of your logfiles directory which will contain the output from this BUSTER job. 

The user can select a cleanup script to be run after a successful run of BUSTER. This will remove many output files of lesser importance. The selection of files to be deleted can be done under user's control, from the preferences panel.


References

     
  • Engh, R. A. & Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Cryst., A47, 392-400.
  • Main, P. (1979). A Theoretical Comparison of the Beta, Gamma' and 2Fo-Fc syntheses.Acta Cryst., A35, 779-785.
  • Read, R. (1986). Improved Fourier Coefficients for Maps Using Phases from Partial Structures with Errors. Acta Cryst., A42, 140-149.
  • Roversi, P., Blanc, E., Vonrhein, C., Evans, G. and Bricogne, G. (2000). Modelling prior distributions of atoms for Macromolecular Refinement and Completion. Acta Cryst., D56, 1313-1323
  • Tronrud, D. E., Ten Eyck, L. F., & Matthews, B. W. (1987). An Efficient General-Purpose Least-Squares Refinement Program for Macromolecular Structures. Acta Crystallogr A, 43, 489-501.
  •  Tronrud, D. E. (1992). Conjugate-Direction Minimization - An Improved Method for the Refinement of Macromolecules. Acta Crystallogr A, 48 (November), 912-916.
  •  Tronrud, D. E. (1996). Knowledge-Based B-Factor Restraints for the Refinement of Proteins. J App Cryst, 29 (2), 100-104.
  •  Tronrud, D. E. (1997). The TNT Refinement Package. in Macromolecular Crystallography, Part B, Eds Charlie Carter, and Robert Sweet, Volume 277 in Methods in Enzymology, pp 306-319.
  •  Tronrud, D. E. (1999). The Efficient Calculation of the Normal matrix in lease-squares refinement of macromolecular Structures. Acta Crystallogr A, 55, 700-703.

Last modified: Thu Sep 27 16:30:32 BST 2001