BUSTER User Manual previous next
Chapter 5

Browsing the BUSTER output

Copyright © 1995-2004 by   Eric Blanc, Pietro Roversi, Clemens Vonrhein,
Gérard Bricogne and the Buster Development Group.
All rights reserved.



Contents


General information about the output of BUSTER

This document describes the output of the BUSTER program and how to interpret it. Our convention in the output is, that every time we think some explanations may be necessary, a hyperlink called 'explanation' can be clicked on in front of a section title.

The BUSTER output is divided into various files :

  • LIST.html. This file is output for all types of calculations performed by BUSTER. The information about the crystallographic data, links to statistical information and plots, estimation of scaling parameters and details of the refinement are presented in this file.

    All necessary information is visible from this file and there should be no need for browsing the various subdirectories by hand.

  • Fourier amplitudes and phases in the shell.01/$BDG_job.final.mtz file. This file is only output at the end of the calculation.

    Note :  the same Fourier coefficients are written to the file shell.01/mlphas.mtz at each refinement cycle (this file is overwritten during each refinement cycle).

    If Maximum Entropy completion is run after a refinement, the amplitudes and phases at the end of the refinement and before the MaxEnt completion are saved in the mtz file shell.01/mlphas_beforeME.mtz

  • HTML files in the shell.01 directory.

  • ML Structure Refinement TNT output files. When scaling parameters, positional and displacement parameters, and imperfection B factor are refined by maximising the log-likelihood function, TNT outputs a number of files concering the current model.


Browsing the LIST.html file

From the BUSTER Control Panel, go to the logfiles directory, and from there to the directory <ProjectID>.<run number>.

The main item of output is the hypertext logfile called LIST.html. All necessary information about the operations performed by BUSTER, the results obtained and how to view these results is presented in this file. We will now go through the various sections of the file.
 

Parsing BUSTER and TNT cardfiles

 
    During preparation of the job using the interface (see Chapter 4 for details) the supplied information is stored mainly in ASCII files that are now parsed. A link to examine these directly is provided. If you used the graphical user interface to prepare these no problem should appear here - since all the testing was already done there.
 

Cell information

 
    Real and reciprocal space cell parameters, and orthogonalisation matrix in the Brookhaven convention are shown.
 

Symmetry information

 
    The presented information consists of
  • a list of the symmetry operations of the space group, each one specified by the rotational part (as a 3x3 matrix) and the translational part (a vector in 3D space);
  • the multiplication table of the group, and its inverse;
  • a list of the isotropy subgroups of the group, each one described in terms of its elements, and of the corresponding coset representatives (see section 1.3.3.1.2.2.2. in Bricogne (1993));
  • a logical flag (True/False) for a check of all the pairs of inverse symmetry matrices in the group.
 

Missing structure chemical composition

 
    A list of the scatterers present in the missing part of the structure; the following quantities are listed for each of the random scatterers:
  • The number of the atoms in the asymmetric unit, as declared via the Atomic Composition field in the input form;
  • The average value of B for these scatterers and the standard deviation for the same quantity, again either as declared in the Atomic Composition field;
  • The coefficients for the 5-Gaussian expansion of the scattering factor, as read from the relevant table; for proteins, this is the CCP4 library.
 

Reading observations file

 
    A section follows with a report of the number of reflexions accepted from the input MTZ file after the resolution limits given by the user have been applied. Some links for more details are given:

 

Shannon sampling

 
   
 

TNT Refinement Output

This is a link to the main TNT log file shell.01/tntlongll.html.
 
   
 

TNT Geometry Output

This is a link to the detailed TNT log file geometry.html for various statistics on geometric restraints.
 
   
 

Summary of model volumes

 
    The summary reported here is a short list of the approximate volumes taken up by the whole macromolecule and its subsets, the partial structure and the missing atoms:
  • the volume for the whole macromolecule is estimated on the basis of the electron density declared in input and of the solvent fraction;
  • the volume of the partial structure is computed from the volume of the mask around it;
  • the volume for the missing atoms is the difference bewteen the first two volumes.
 

Generating of smoothly varying distribution

 
    The output from the generation of binary masks and the blurring of these masks into smoothly varying distributions follows. The end products of these operations are a blurred mask called babslv for the whole molecule, which is also the Babinet opposite of the bulk solvent mask, and a blurred mask called prior, defining the prior prejudice for the non-uniform distribution of random atoms.

Note : the binary objects are removed once they are converted into smoothly varying distributions

The information about the mean electron density within the fragment mask is given. This mask is an intermediate in the calculation of the prior and is not itself used any further, but the mean density within it is calculated as a check of the adequacy of the value of the radius FRGRAD specified as input. This density should be about 0.425 electron per cubic Ångstrom for protein.

If the value displayed is very different from the expected value of 0.425, the contents of the fragment should be checked directly from the pdb file *.pdbfrg. Discrepancies may also arise through the value of the radius around the fragment: if the mean density value displayed is smaller than expected, FRGRAD should be decreased; if it is greater, FRGRAD should be increased.

 

Generation of babslv by theta-filter

 
    The file babslv.html contains information about:
  1. the generation of the binary mask around the whole molecule; the mask is used to compute the scattering from the bulk solvent; if so requested via the Babinet opposite of Bulk Solvent menu, a binary masks is first traced around the PDB model for the whole molecule. The value of this radius can be altered in input by means of the BLKRAD keyword.

    Two quantities are output:

    • Volume fraction occupied by mask: the number of pixels in the binary mask divided by the total number of pixels in the cell;
    • Value of uniform prior probability within mask:set equal the reciprocal of the volume fraction;

    The binary mask is deleted after blurring (see next point);

  2. the blurring of the binary mask mentioned above to give an envelope around the whole molecule; this is done by:

    1. convolution with a Gaussian and a sphere; some parameters relevant to this convolution are output:

      • Blurring (temperature) factor:the parameter entering the exponent of the convoluting Gaussian;
      • Blurring sphere radius in Angstroms:the radius for the convoluting sphere;

    2. linear combination of the original mask and the blurred mask resulting from the convolution above; the coefficients are echoed:

      • Coefficient of original Fourier transform:set to 0: the original binary mask won't contribute any component to the blurred mask;
      • Coefficient of blurred Fourier transform:set to 1: the blurred mask is equal to the result of the convolution.
 

Electron density in frg mask

 
    The file contains the value of the electron density in the fragment mask, as computed from the atoms declared in the fragment PDB file, and from the fragment mask volume.

If the resulting value for the fragment electron density is larger (smaller) than what was declared in the protein electron density field at input preparation time, you might want to increase (decrease) the value of the radius of the fragment mask with the keyword FRGRAD.

The same file also contains a summary of the volumes taken up by the various components of the model.


At this point, the content of the output file will differ, depending whether a ML refinement or a structure completion job has been requested.

 

Maximum Likelihood Refinement

 
   
 

MaxLik Scaling Cycle

 
    At each MaxLik scaling cycle, the Log-Likelihood, the overall scale and B factors and the imperfection B factors [2,3] are reported. The parameter values for each scaling cycle are reported in the file shell.01/MLoutputs/mlnorm.<cycle#>.html. If you asked for detailed output during the scaling using the MLNORM keyword, additional information about gradient and Hessian of the Log-Likelihood gain at each scaling cycle is given in the file shell.01/MLoutputs/mlnorm.<cycle#>.html

The various scale parameters are defined as follows:


K and Baniso = scale K and the anisotropic scaling tensor Baniso are the ones needed to put the observed data on absolute scale by:
 
Fabs(h)  =  Fo(h) × K × Tiso(h) × Taniso(h)
 
where:
 
Tiso(h)  =  exp(-¼ × B × dstar(h)**2)
Taniso(h)  =  exp[-¼(    h × h × astar × astar ×            B11aniso+
           k × k × bstar × bstar ×            B22aniso+
           l × l × cstar × cstar ×            B33aniso+
       2 × h × k × astar × bstar ×  cosgstar × B12aniso+
       2 × h × l × astar × cstar ×  cosbstar × B13aniso+
       2 × k × l × bstar × cstar ×  cosastar × B23aniso)]
 

This particular functional form for Taniso(h) is the one adopted in TNT. The values of the elements of the orthogonal B tensor are reported in the B factors plot.

Only the symmetry-allowed elements of Baniso are refined, under the constraints imposed by the point-groupo symmetry. An additional constraint is imposed on B33aniso so that Baniso is traceless:

    B11aniso+B22aniso+B33aniso=0

If K is significantly different from unity, you might want to multiply your experimental amplitudes by its value to bring the data onto absolute scale.


Kmiss and Bmiss = scale Kmiss and temperature factor Bmiss are the ones needed to scale the missing atoms' contribution to the structure factor:
 
Fmissscaled(h)  =  Fmiss(h) × 1/Kmiss × exp[-¼ Bmiss × (dstar(h))**2]
 
They alter the number and B factors of the missing atoms as declared in input. Values of Kmiss smaller(larger) than one mean that the number of missing atoms needs to be increased(decreased).

Ksolvent and Bsolvent = scale Ksolvent and temperature factor Bsolventare the ones needed to scale the bulk solvent contribution to the structure factor:
 
Fsolventscaled(h)  =  Fsolvent(h) × 1/Ksolvent × exp[-¼ Bsolvent × (dstar(h))**2]
 
They alter the value of the electron density and B factor of the solvent, as declared in input. Values of Ksolvent smaller(larger) than one mean that the electron density of the solvent needs to be increased(decreased).

Bfragimpf = the imperfection B factor Bfragimpf attenuates the expectation value for the structure factor from the fragment according to:
 
<Ffragimpf(h)>  =  Dfrag(h) × Ffrag(h)
 
and increases/decreases the attached variance Sigma2frag(h):
 
<Sigma2frag(h)>  =  [1-Dfrag(h)**2] ×  <|Ffrag(h)|**2>
 
where:
 
Dfrag(h)  =  exp[-¼ Bfragimpf × (dstar(h))**2]

Kmissimpf and Bmissimpf = these imperfection parameters increase/decrease the missing atoms variance Sigma2miss(h):
 
<Sigma2miss(h)>  =  (1/Kmiss)**2 ×  exp[-2 × ¼ Bmiss × (dstar(h))**2] ×  (1/Kmissimpf)**2 × [1-Dmiss(h)**2] × <|Fmiss(h)|**2>
 
where:
 
Dmiss(h)  =  exp[-¼ Bmissimpf × (dstar(h))**2]

Ksolvimpf and Bsolvimpf = these imperfection parameters increase/decrease the solvent variance Sigma2solv(h):
 
<Sigma2solv(h)>  =  (1/Ksolv)**2 ×  exp[-2 × ¼ Bsolv × (dstar(h))**2] ×  (1/Ksolvimpf)**2 × [1-Dsolv(h)**2] × <|Fsolv(h)|**2>
 
where:
 
Dsolv(h)  =  exp[-¼ Bsolvimpf × (dstar(h))**2]

 

Statistics

 
    Various kinds of statistics are produced throughout the calculation and reported at each individual cycles. There are two kind of plots:
  • Statistics in resolution bins

    These are plots of binned average statistics in resolution bins at each cycle. The number of reflexions in each resolution bin is tabulated in the two files:

    • Data.bins.html contains the fine binning used for plots of all reflexions, irrespective of the FreeR_flag.
    • Data.wf_bins.html contains the coarser binning used for plotting free-set and working-set averages separately.

    The various plots are:

    • R factors plot

      These are averaged in resolution bins, for the working-set and free-set reflexions separately (file shell.01/CCplots/R_Rfree.<cycle#>.mtv):

           K × Tiso(h) × Taniso(h) × |Fo(h)|-|Fxpct(h)|
           R  =  Sh 
           K × Tiso(h) × Taniso(h) × |Fo(h)|

    • R factors table

    • <Log-Lik Gain>

      These are averaged in resolution bins, for the working-set and free-set reflexions separately (files shell.01/CCplots/LLG.<cycle#>.mtv):

           Likelihood [ Current Model ](h)
           LLG  =  Sh Log 
           Likelihood [ Starting Model ](h)

    • Ln Sigma A

      These are plots of Log [ SP/ SN] averaged in resolution bins, for the working-set and free-set reflexions separately (files shell.01/CCplots/Ln_sigmaa.<cycle#>.mtv):

           <e × Fo**2>
           Log [SN/ SP]  =   Log 
           <e × Fxpct**2>
      (see the definition of FXpct below). Notice that in the notation of (Main and Read) the quantity plotted here would be called Log [ SA/D]

    • Correlation Coefficients

      Four quantities are plotted vs. the resolution d* (files shell.01/CCplots/cos_coef.<cycle#>.mtv):

      • Fo,Ffrag: the correlation coefficient between the observed structure factor amplitude (|Fo(h)|) and the amplitude of the structure factor as calculated from the partial structure in the fragment file (|Ffrag(h)|).

        This curve gives a measure of the degree of completeness of the partial structure model as a function of resolution: correlation coefficents close to unity indicate that the model for the partial structure accounts for most of the observed amplitudes. This curve has troughs were the missing structure and/or the bulk solvent have strong Fourier components.

      • Fo,Fc: the correlation coefficient between observed structure factor amplitude (|Fo(h)|) and the amplitude of the total structure factor as calculated from the current model (|Fc(h)|).

        Correlation coefficents close to unity here indicate that the fragment, prior and bulk solvent models can account for most of the observed amplitudes at that resolution. They can be used to monitor the improvement during the course of the refinement, or at successive stages of model building: you should see them approach unity as the model improves.

      • Fc,Fxpct: the correlation coefficient between the amplitude of the total structure factor as calculated from the current model (|Fc(h)|) and the expectation value for the same amplitude as calculated from the Rice distribution (|Fxpct(h)|). These correlation coefficients depart from one because of the error model associated with the partial structure and the missing atoms.

        After cycle 1 of a refinement the Fc,Fxpct curve should be on the same scale as Fo,Fcalc: this means that the variance estimate are adequate. If they are lower or higher, you might want to change the parameters that affect the variances. This is done by selecting the generalised MaxLik scaling via the Refine scaling parameters buttons in the input form.

      • Fo,Fo+delta: the curve informs about the correlation between the observed structure factor amplitudes (|Fo(h)|) and the amplitudes plus the experimental noise (|Fo(h)|+delta(h)). The correlation usually plunges with resolution due to the increased estimated standard deviations on the measured amplitudes. The curve can be useful to confirm the maximum resolution up to which the signal is above the noise, or spot noisier resolution ranges, such as those where ice-ring scattering might peak.
  • Statistics vs. cycle number

    The evolution of several values vs. cycle number is shown:

    • R factors

      Shows the evolution of the working-set and free-set R values with cycle number. The free R value is computed against the free set in file tnt_<FreeR_flag>.hkl in the root directory. The working R value is computed against the working set in (file tnt.no<FreeR_flag>.hkl). For the definition of the R factors see above.

      The R values at cycle 0 are the ones pertaining to the starting model, after Maximum Likelihood scaling. The R values at cycle 1 are the ones after the first refinement/maximum entropy cycle, and so on.

    • <Log-Lik Gain>

      Similar to the R factors plot, but for the log-likelihood gain (LLG) per reflexion rather than the R-value. The LLG statistic involves the prediction variances as well as the expectations of model structure factors and is therefore a good measure of the improvement of the current model over the starting one. For the definition of the LLG see above. However, the quantities plotted here are normalised by the number of reflexions, so that free- and working-LLGs are on the same scale.

      The LLG values at cycle n are the ones after the nth refinement/maximum entropy cycle.

    • GEOM Residuals

      Shows the TNT normalised sum of the geometric residuals GEOM, plotted against the refinement cycle number. A horizontal line shows its ideal value of unity (i.e. the value this normalised sum would have if all the geometric restraints were obeyed within their expected uncertainty):

        S iN  [ ( Gmodel - Gideal ) /  s (Gideal) ]i
      GEOM = 
      N
      where:
      N = number of restraints

      If GEOM shows an asymptote below that line, the X-ray weight specified by the user should be increased (the geometry is too tight and can be loosened). If GEOM is above the line of unity, the X-ray weight value should be decreased (the geometry is too loose and needs tightening). If the GEOM value is above unity but still decreasing, more cycles of refinement are needed.

    • Ln SigmaA

      The value of Ln(SA), obtained by the intercept of a SA plot at d* = 0, is output vs. refinement cycle number (see Main and Read). For a complete structure, and if the error model is adequate, this quantity should be close to zero.

    • Scale Factors

      Scale factors are output during refinement. These plots display the values contained in the Wilson and Maximum Likelihood scaling output files.

    • B Factors

      Temperature factors are output during refinement. These plots display the values contained in the Wilson and Maximum Likelihood scaling output files. The Wilson B should converge to values close to zero after a few cycles of refinement, when the individual B factors have reached the right overall value.

      The elements of the anisotropic scaling tensor B displayed here are the orthogonal ones:

      Fc(s)  =  Fo(s) × exp (-¼ × sTB × s)
        where:
          s = M-1T × h
        is the reciprocal lattice vector expressed in an orthonormal frame and M-1T is the orthogonalisation matrix for reciprocal space (see R. Diamond, (1993).

      Only the elements of B that are allowed by symmetry are displayed.

    • Imperfection Factors

      Imperfection factors are output during refinement. The Maximum-Likelihood refined Bfragimpf should decrease as the refinement progresses. A Luzzati-B estimate for the overall imperfection is also plotted, and labelled "LuzzB". The value of this imperfection B is obtained by the slope of a SA plot (see Main, 1979 and Read, 1986).


 

Maximum Entropy Completion

 
    Once the number of cycles requested through the MXLCYC field have been performed, the program will stop unless MaxEnt completion has also been requested.
 

Unimodal phase information

 
    Here is a list of the reflexions which are accepted into the "trunk", i.e. are assigned Lagrange multipliers in the MaxEnt modulation of the prior prejudice. The acceptance criteria are the resolution limits for basis-set selection specified for the current shell, and the figure-of-merit threshold specified through the FOMTHR field/keyword.

Only reflexions whose phase-probability distribution is unimodal enter the trunk. In presence of experimental phases some reflexions have phase probability distributions with two maxima, but can sometimes still be considered unimodal, if rejection of one of the maxima occurs, according to criteria explained in the file HenLat_<shell#>.html

 

Constrained Bayesian Score

 
    This contains details of the calculation of the map for the missing structure, effected by maximising the Bayesian score:
  • Entropy Loss and Bayesian Score: a link to the file unimod<shell#>.html. This file contains a summary of the main quantities involved in the calculation;

  • Bayesian Score maximisation ...: a link to the file qadsol<shell#>.html. This file contains the computational details of the Bayesian score maximisation;

  • MaxLik normalisation ... : a link to the file ml_norm_unmd<shell#>.html, containing the output from the MaxLik scaling effected while the prior is being updated.

Final Results

 
    Additional plots are presented once all calculations have been finished:
  • Average Figure of Merit

    This plot shows the average Figure of Merit vs. resolution. It is only output at beginning and at the end of the calculation. Remember that high figures of merit mean a phase distribution that is very sharp, i.e. precise, but not necessarily accurate. In particular, the F.o.M. depends crucially on the variance estimates, and can be overestimated if the variances are too small.

  • Structure Factor Amplitudes

    The file contains the plot of average structure factor amplitudes on absolute scale vs. resolution. It can be useful in the diagnostic of scaling problems, e.g. at low resolution where bulk solvent models are often inadequate. It is only output at beginning and at the end of the job. For the definition of the quantities plotted, see below

    Five quantities are plotted vs. the resolution d*:

    1. Fobs: the observed structure factor amplitudes brought to absolute scale by the current values of the scale factor and overall B factor;

    2. Ffrg: the structure factor amplitudes for the partial structure in the current fragment file;

    3. Fcalc: the amplitude of the total structure factor as calculated from the current model; it has contributions from the partial structure, as well as from the missing random atoms and from the bulk solvent;

    4. Fxpct: the expectation value for the total structure factor amplitude as calculated from the Rice distribution it follows; it depends on the variance as well as on the offset:

      Fxpcth= ò ¥ 0  |F|h  ×   Rice(|F|h)  d|F|h

    5. sigmaFobs: the estimated standard deviations for the observed structure factor amplitudes; they can be useful to check the take off of the noise with resolution.

The $BDG_job.final.mtz file

 
    The final results in term of phases and amplitudes for the electron density are in the $BDG_job.final.mtz in the shell.01 directory. At any stage, the phases are the ones computed from the current BUSTER model. All structure factors are on absolute scale.

The columns in the file are:

  FCTR, PHICTR      Centroid Electron Density Map  :  mFobsexp(i × j centroid)
  2FOFCWT, PH2FOFCWT   SigmaA-weighted 2Fo-Fc  :  2mFobsexp(i × jcentroid)-Fcalcexp(i × jcalc)
  FOFCWT, PHFOFCWT   Fo-Fc difference coefficients  :  mFobsexp(i × jcentroid)-Fcalcexp(i × jcalc)
  FOFRGSLV,PHFOFRGSLV   A second type of Fo-Fc difference coefficients, in which the Fc part has no contribution from the missing atoms model  :  mFobsexp(i × jcentroid)-(Ffragexp(i × jfrag)+Fsolvexp(i × jsolv))
  FOM   BUSTER figure of merit
  HLA HLB HLC HLD   Hendrickson-Lattmann coefficients encoding the BUSTER model phases

In addition to these the file contains the two columns with the experimental amplitudes and sigmas used for the refinement - their labels unchanged.

Notice that the same Fourier coefficients are written to the file shell.01/mlphas.mtz at each refinement cycle (the latter file is overwritten at each refinement cycle).

If Maximum Entropy completion is run after a refinement, the amplitudes and phases at the end of the refinement and before the MaxEnt completion are saved in the MTZ file shell.01/mlphas_beforeME.mtz

Other files containing information which you may wish to consult in order to check the input and the data reside in the "root" directory of the BUSTER tree, the one named <ProjectID>.<run number>. These files are HTML documents, usually accessible from the main output file LIST.html via hyperlinks.


Files in the shell directory

 
    Other files inside the shell.01 directory contain useful information about the calculation.
  • hist_frg_omega.html and hist_whole_omega.html

    These files contain the cumulative distribution of the omega function for the partial structure and the whole macromolecule, respectively.

    The omega function is either the local fluctuation of the partial structure density or its local average, depending on the parameters chosen in input for the envelope tracing.

    Together with the partial structure volume fraction and the protein volume fraction, these cumulative distributions are used to construct a Fermi-Dirac envelope for the missing atoms: first the Fermi-Dirac envelopes are computed for the partial structure (mfrag(x)) and whole structure (mwhole(x)). Both of these envelopes m(x) are normalised so that 0< m(x) < 1.

    Then, the envelope for the missing atoms is obtained by:

    mmiss(x)=mwhole(x) × [1-mfrag(x)]

  • hist_prior.html

    The file contains the cumulative distribution of the missing atoms envelope mmiss ; together with the specified solvent fraction (see the solvent description fields in the input form) the cumulative distribution is used to assess the contouring level at which the prior distribution itself can be displayed as a mask with the desired solvent fraction.

  • prior.html

    This file contains information about the generation of the prior envelope. First, a binary mask is traced around the model in the PDB model for the whole molecule, as specified in the PDBNUP file. A 4 Å radius is used as default; the value of this radius can be altered in input by means of the MSKRAD keyword.

    Then, a different procedure is followed, depending on the presence/absence of a know "fragment" substructure:

    • if no fragment is present, the binary mask around the whole molecule is blurred by theta filtering, as described above for the (complement of ) the bulk solvent distribution. The resulting blurred distribution is used as a prior;
    • if a fragment is present, a binary mask is built around the PDB model for the fragment; a logical "whole but not fragment" binary mask operation is performed; and the resulting prior binary mask is blurred, again as described above for the whole

  • LogLik0.html

    The file contains the average values of the Likelihood of the starting model (null hypothesis). The total likelihood has been divided by the number of reflexions, so the value should be independent of the number of reflexions.

    Typical values for reasonably good models/data range between -10 and 0. Values smaller than -10 might indicate severely incomplete or imperfect models, or errors in scaling and/or in the internal error model.

  • HenLat_<shell#>.html

    The file contains the output from the analysis of the modes of the Hendrickson-Lattmann structure-factor phase probability distribution. The number of zero-, uni- and bi-modal reflexions are listed. The phase probability distribution for a subset of the reflexions whose maxima were rejected is output as a plot of P(j) between 0 and 360 degrees.

  • trnk.<shell#>.html

    The file contains the list of reflexions which are accepted into the "trunk", i.e. are assigned Lagrange multipliers in the MaxEnt modulation of the prior prejudice. The figure of merit for these reflexions is higher than the threshold set with the FOMTHR field/keyword. The resolution limits are those set via the Maximum Entropy resolution limit field at input time.

    In the course of the maximum Bayesian score modulation of the prior prejudice, one or two Lagrange multipliers will be varied respectively for each centric or acentric reflexion in the list.

  • unimod.<shell#>.html

    The file contains a summary of the Bayesian score maximisation that leads to the Maximum Entropy map for the missing structure: Lagrange multipliers for reflexions in the trunk are varied, and structure built into the initial envelope (prior).

    The convergence criterion is echoed first. Iteration is terminated when the shift in Bayesian score from one cycle to the next (normalised to the value of the Bayesian score itself: see second column in this file) falls below this value. The threshold has a default value of 10E-05.

    Then, at each cycle of the iterative maximisation process, the following quantites are reported:

    • Bayesian score fractional change: it is the ratio between the total variation in Bayesian score from the first to the current cycle, and the current value of the Bayesian score itself;
    • Bayesian score fractional shift: it is the ratio between the variation in Bayesian score from the previous to the current cycle, and the current value of the Bayesian score itself. It is the quantity used as a termination criterion for the iteration;
    • BS: Bayesian Score, defined as:

      BS=LogLik + N Sm

      where N is the number of random scatterers, as declared via the Atomic Composition field in the input form;

    • Sm: Entropy Loss, referred to the starting prior m(x);
    • Grad S: components of the gradient of the Entropy Loss with respect to the two Lagrange multipliers associated to the distance restraint and to the Entropy loss;
    • Grad L: components of the gradient of the Log-Likelihood with respect to the two Lagrange multipliers associated to the distance restraint and to the Entropy loss;
    • Hess S: elements of the Hessian of the Entropy Loss with respect to the two Lagrange multipliers associated to the distance restraint and to the Entropy loss;
    • Hess L: elements of the Hessian of the Log-Likelihood with respect to the two Lagrange multipliers associated to the distance restraint and to the Entropy loss.

  • qadsol.<shell#>.html

    The file contains a detailed summary of the Bayesian score maximisation that leads to the Maximum Entropy map for the missing structure.

    The Likelihood-Gain is at each step maximised with a constraint of the Loss of Entropy and a constraint on the shift ('Distance') in Bayesian Score: two Lagrange multipliers Mu and Nu are associated respectively with the Entropy and distance constraints.

    The actual number of dimensions of the subspace in which each cycle takes place is output (SD) together with the total number of dimensions of the space (TD, always equal to 2).

    The shifts along the two directions corresponding to the Entropy Loss and the Log-Likelihood distance Lagrange multipliers are also reported under the headings DirS Shift and DirL Shift.

  • BS_surf.<shell#>.html

    The file contains information about the trajectory along the Bayesian Score surface, as it is waded through during the modulation of the prior-prejudice. The maximisation is really carried out as a minimisation of the negative of the Bayesian Score, hence the minus signs in front of the BS columns headings.

    At each cycle an attainable value of the Bayesian Score is echoed ( -BS Att.), together with the distance from the current point to the attainable BS (D Attainable); the distance constraint will sometimes allow only a closer value of BS to be targeted ( -BS All.); the distance to this target value is reported in the last column ( D to -BS Target).

  • mlnorm_unmd.<cycle#>.html

    At each cycle during the modulation of the prior-prejudice, the Log-Lik Gain, the overall scale and B factors and the fragment imperfection B factor [2,3] are reported. For details, see below the documentation given for the files mlnorm.<cycle#>.html.

  • HLplots directory

    The shell.01/HLplots folder contains the output files from the analysis of the Hendrickson-Lattmann phase probability distribution; the analysis is done to gather unimodal reflexions in the "trunk" used to calculate the Maximum Entropy map during completion jobs. BUSTER rejects maxima of the phase probability for some reflexions, either on the grounds that they are too shallow, or merged into a single peak, or but a minor hump with respect to the main maximum. The phase probability distribution for a subset of the reflexions whose maxima were rejected is output as a plot of P(j) between 0 and 360 degrees.

  • rmsdrho.mtv

    This file is output when the input MTZ file contains FOM's. It displays the rmsd of the electron density computed from the current centroid phases as from (Blow and Crick, 1959):

    <Dr**2> = S h[ eh ×  (1-FOMh**2)  ×  Fhobs**2 ]

  • Fragments directory

    The shell.01/Fragments folder contains the TNT coordinate files ( shell.01/Fragments/cycle_<cycle#>.cor) at the various cycles during the course of the refinement. The coordinate files can be deleted when archiving the output with the Save option, depending of what you specify at the end of the preferences.

    If you wish to transform the TNT coordinate file into a PDB file, you can either do it via the map display page, or run $tntbin/convert:

    			$tntbin/convert << eof
    			CELL a b c alpha beta gamma
    			INCLUDE <$tntdata>/connect.dat
    			INCLUDE cycle_00n.cor
    			INCLUDE sequence.file.seq
    			PUNCH cycle_00n.pdb BROOKHAVEN
    			eof
    			
    The partial structure model at the last refinement cycle is output to the PDB file <JobName>/shell.01/<JobName>.final.pdb.

TNT output files

 
    The main Maximum Likelihood refinement log is shell.01/tntlongll.html, the section pertaining to the latest completed cycle being tntlongll_cyvle<cycle#>.html. These contain verbose output from the three modules of TNT used here: rfactor, geometry and shift. In spite of its clumsy length it is useful in diagnostics of misbehaviour caused e.g. by inadequate weighting causing excessive or minute parameter shifts, and in calling attention to the largest deviations from target values for the various types of restraints.

Three TNT output files contain the output from the script tntlongll:

  1. agarwal.dat

    Output file from the call to the TNT module rfactor that computes the gradients for the working set reflexions.

  2. rfactor.old

    Output file from the call to the TNT module rfactor that computes the curvatures for the working set reflexions.

  3. rfactor.dat

    Output file from the call to the TNT module rfactor that merges the gradients and curvatures for the working set reflexions.

geometry.dat is the output file from the call to the TNT module geometry that computes the gradients and curvatures of the GEOM stereochemical restraint criterion.

geometry.html is the output from the geometry module of TNT, and lists the worst violations to the geometric restraints.

Three ASCII hkl files in TNT format are stored in the <ProjectID>.<run number> directory of the BUSTER job directory tree:

  1. tnt.hkl The file contains the structure factor amplitudes in TNT format, as read from the input MTZ file, and after the rejection of the data listed in the file rejections.html.

  2. tnt_<FreeR_flag>.hkl The file contains the amplitudes with the FreeR_flag chosen in input, which is used as a test set for cross validation. The default test set is the one with FreeR_flag=0.

  3. tnt_no<FreeR_flag>.hkl The file contains all the data but the test set FreeR_flag, and is therefore the working set for structural refinement.

View Maps

 
    Within each cycle as well as at the end it is possible to view the current maps through a graphical helper window by following the hyperlink View Maps.


Types of maps

 
    BUSTER computes Fourier coefficients for five kinds of maps:
  1. Centroid Electron Density Map

    The centroid electron density map is obtained by Fourier Transform of observed structure factor amplitudes, and the centroid phases (as introduced by (Blow and Crick, 1959)) derived from the phase probability distribution corresponding to the current statistical model. Therefore, the centroid map contains the electron density for the atoms in the fragment, and for the current model for the random atoms and solvent. It is less noisy than the sigmaA-weighted 2Fo-Fc (see below) but not as sensitive to details in the missing part of the structure.

  2. "SigmaA" weighted 2Fo-Fc Electron Density Map

    The map is a 2Fo-Fc electron density (see Main and Read); but the BUSTER map uses the current phase probability distribution which does possess non-Wilsonian contributions from the missing atoms). The map is produced by Fourier transforming the coefficients 2FOFWT and 2PHFOFCWT in the $BDG_job.final.mtz file in the shell.01 directory.

    Keep in mind that the centroid should be less noisy, but that fragment and missing features are put on approximatively the same scale in the "sigmaA" map, so the latter is the map to inspect for weakly scattering missing structure.

  3. "SigmaA" weighted Fo-Fc Difference Density Map

    The map is produced by Fourier transforming the coefficients FOFCWT and PHFOFCWT in the $BDG_job.final.mtz file in the shell.01 directory. Here, Fc=Ffrag+Fmiss+Fsolv

    The positive (negative) contours of this map will show regions where the current model lacks (has too much) density, and the negative regions will be regions where the model of partial structure, missing atoms and solvent has too much (little) density, either because:

    • a partial structure atom is misplaced or its B factor being too small (high);
    • or the prior prejudice distribution for the missing atoms is too strong (weak);
    • or the solvent envelope is misplaced.
    This map is identical to the Fo-Ffrag-Fsolv described below if there are no missing atoms declared.

  4. "SigmaA" weighted Fo-Ffrag-Fsolv Difference Density Map

    The map is produced by Fourier transforming the coefficients FOFRGSLV and PHFOFRGSLV in the $BDG_job.final.mtz file in the shell.01 directory. Notice that the model Fmiss for the missing structure is not subtracted from the Fo here.

    The positive contours of this map will show regions where the current model lacks density, either because a partial structure atom is misplaced or its B factor being too small; or because there are missing atoms; or because the solvent envelope is misplaced. This map is identical to the Fo-Fc described above if there are no missing atoms declared.

  5. Maximum Entropy Map for the missing part of the structure

    The file you obtain at the end of a structure completion job (Maximum Entropy structure completion button) is stored in the shell directory of your current run, under the name Q_Map.<cycle#>.map (e.g. busterfiles/logfiles/PROTEIN.4/shell.01/Q_Map.022.map).

    Notice that this map is a positional probability distribution for the missing structure only, and does therefore look peaky and sometimes not well connected... a more 'traditional' map for the missing part can be inspected looking at the positive contours of the Fo-Fc map.

    You can display the Maximum Entropy map by clicking on the View Maps hyperlink and select the Maximum Entropy map as the object to be displayed. This will trigger scripts that will service it to the displaying tool of your choice. You can extend your fragment by building or rebuilding into that MaxEnt map at least some of the hitherto missing structure.

The Fourier coefficients for the Centroid and sigmaA weighted maps are all in the file shell.01/$BDG_job.final.mtz.

In addition, the Maximum Entropy map is produced in CCP4 format, and is contained in the shell.01 directory, in the file Q_Map.<cycle#>.map


Displaying maps

 
    From the Plotting page, you can select the maps and model to be displayed. The displaying can be done in two ways:
  1. Plot: this option will trigger the generation of maps and models on the server machine; a gzipped tar file is then serviced to the client machine, and the application specified in your .mailcap file will display the results on the client;

  2. Script: this option will generate a script that you can save to disk, and execute from the client machine; the script will generate the maps and model and display them on the client; this is especially useful when the maps are so big that the conventional plotting route would lead to time-out of the plotting request on the part of the browser.
The Save Map(s) and PDB file(s) (tar file) option allows you to save to disk a tar file with your map(s) and model(s), in CCP4 and PDB format respectively.

Note :  for the display tools to work, the paths to the helpers binaries NEED to be set by the system administrator in the file: $BDG_home/bin/helpers.local for all the client machines you want to display results on. Furthermore, your browser needs to have the correct handling enabled for these tools. Please see the installation instructions for the Helper applications.

You have the choice between different viewpoints: examining the density around the PDB models, the whole cell or just the asymmetric unit. The default can be set editing the preferences from the BUSTER Control panel. Each viewpoint has its advantages:

  • The "asymmetric unit" plot contains all the information, takes less time to compute and uses less space in memory and on disk. If your viewer supports this, crystallographic symmetry information can be used to generate density at any point in space using this unique volume of density.

  • "Contour around the model": any PDB file in your datafiles directory whose name starts with "model_" and ends with a ".pdb" suffix will be eligible for display at this time, e.g. "model_protein.pdb". The contouring around the model will allow for inspection of the connectivity of the map when the model sits across the asymmetric unit boundaries.

    If more than one model is present, the last one will be selected for the contouring.

  • The "whole cell" option enables you to see the molecular boundaries and the chain connectivity without bumping into the limits of the asymmetric unit.


References

 
   
  1. Blow, D.M. and Crick, F.H.C. (1959). Acta Cryst. 12, 794-802.
  2. Bricogne, G. (1993). Int. Tab. for Cryst. B, 23-106.
  3. Bricogne, G. & Irwin, J.J. (1996) in Macromolecular Refinement - Proceedings of the CCP4 Study Weekend. SERC Daresbury Laboratory, Warrington, England; 85-92.
  4. Bricogne, G. & Irwin J.J. (1997). In Crystallographic Computing 7 edited by P.E. Bourne and K.D.Watenpaugh, 1-9.
  5. Diamond, B. (1993). Int. Tab. for Cryst. B, 345-373.
  6. Main, P. (1979). A Theoretical Comparison of the Beta, Gamma' and 2Fo-Fc syntheses.Acta Cryst., A35, 779-785.
  7. Read, R. (1986). Improved Fourier Coefficients for Maps Using Phases from Partial Structures with Errors. Acta Cryst., A42, 140-149.
  8. Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760-763.

Last modification: 28.01.04