BUSTER User Manual
Refinement of a 40% incomplete MR search model tutorial: PP_PI-3

BUSTER tutorial 5


Refinement of a 40% incomplete MR search model after rigid-body refinement

This tutorial illustrates the use of BUSTER for the early stages of refinement of a model after rigid-rody refinement of a Molecular Replacement solution.

The structure of the complex between the porcine pepsin ("PP", 326 residues) and its major protein inhibitor ("PI-3", 149 residues) from the Ascaris suum worm was solved by molecular replacement by Ng et al. in 2000 at 2.4 Å using search models from the native structures of the pepsin and the PI-3 inhibitor[1]. The structure was deposited with PDB accession code 1F34.

Early efforts to solve the structure of the PP_PI-3 complex by phasing with the Molecular Replacement solution for PP alone were not successful: this incomplete model for PP represents only 42% of the complex. To give an example of the difficult early stages of refinement of a severely incomplete MR solution, we present here the BUSTER refinement of the same incomplete PP MR search model, straight after rigid-body refinement.

What does BUSTER do here

There are three separate steps in this job:

  1. cycle 0: the first BUSTER run uses the starting structural model (PP + bulk solvent) to phase the structure factor amplitudes; from these phases a 2Fo-Fc electron density map is computed and a prior distribution for the location of the missing atoms is derived; the output of this scaling-phasing only job is LIST.0.html
  2. cycles 1-10: the second step is the Maximum Likelihood refinement of the PP partial structure model, while the distribution for the missing atoms is kept equal to the initial one; data up to 2.4 Å are used for refinement.
  3. cycles 11-13: the third step is the Maximum Entropy density modification of the density for the missing PI-3, using data up to 3.0 Å, while the model for the partial structure is kept to what is was at the end of refinement.

Input Preparation

A few parameters need some attention: see the simple quick input guide

Main items of the output

We list here a few of the key items you might want to check in the LIST.html output file:

  1. Observed and calculated structure factor amplitudes plots: with no model traced for PI-3 at all, a good model for the scattering from the bulk solvent is not straightforward to construct. Here we compare the plots of calculated and observed F's with bulk solvent model (from the BUSTER calculation described in this document ) and without it (a different one):

    With bulk solvent: the lowest-resolution model amplitudes are too low with respect to the observed ones: the bulk solvent mask based on PP alone is too large, and the masking effect too pronounced. Still, this is the best we can do with the BUSTER code as it is now. See on the side the effects of omitting the bulk solvent model altogether. No bulk solvent: the model F's(from a different BUSTER calculation ) would be too strong with respect to the observed F's at low-resolution: the masking of the partial structure F's would be effected by the flat regions of the missing atoms envelope only. Omitting the bulk solvent mask would not be a good choice.

    Notice also that the absence of a bulk solvent model bends the scaling, and the values of <Fobs> on absolute scale as plotted here are be quite different from the ones in the plot aside (the scale factor of this calculation comes out as 1.3 instead of 1.1)

  2. R-factors and Log-likelihood gain plots: the simplest way of checking if the model parameters are indeed improving the fit to the working- and free-set.

    The R values at after the first round of scaling, at cycle 0 are around 40%.

    During refinement (cycles 1-10) the R factors decrease; the final working-set and free-set Rfactors do not differ by less than one percent.

    The Maximum Entropy completion (cycles 11-13) overfits but manages to reduce the Rfree nevertheless.

    The plot of Log-Likelihood gain vs. cycle number shows an increase of the LLG, starting from the initial value of 0;

    The working-set LLG is bound to increase because the refinement is driven by maximising the likelihood of the model with respect to the working set; more important is to check that the free-set LLG increases as well.

    Again, as observed with R factors, some overfitting is introduced but a small improvement in the free LLG is evident.

  3. Correlation coefficients plots: the correlation coefficients curves are a good tool to monitor the improvement in the structural model during refinement and MaxEnt completion; they are independent of the overall scale factor but do depend on the values of the relative scale factors between the partial and missing structures.

    The (Fobs,Ffrag) and (Fobs,Fcalc) curves only differ at low resolution because in Fcalc there are contributions from the solvent and the low-resolution envelope for the missing atoms, while in Ffrag only the atoms in the PDB model for the partial structure are taken into account.

    Most importantly, the (Fcalc,Fexpct) curve depends on the imperfection parameters that parameterise the BUSTER internal error model. The larger the internal estimate for the error on the calculated F, the more this CC curve departs from unity. A comparison between the (Fcalc,Fexpct) and the (Fobs,Fcalc) correlation coefficients curves can inform as to the adequacy of the BUSTER internal error model: if the latter is correct, after the first cycle the two curves should be close to one another.

    The (Fobs,Fobs+d) curve is a measure of the noise on the data vs. resolution. This correlation coefficient is lower than unity when the noise on the data becomes large (typically at high resolution, where the I/s(I) is lowest).

Final Results

References

[1] K.K.S. Ng, J.F.W. Petersen, M.M. Cherney, C. Garen, J.J. Zalatoris, C. Rao-Naik, B.M. Dunn, M.R. Martzen, R.J. Peanasky, and M.N.G. James. Nature Structural Biology 7(8), 653-657 (2000)
Eric Blanc, <blanc@GlobalPhasing.com>
Pietro Roversi, <pietro@GlobalPhasing.com>

Last modified: Fri Jan 9 11:24:09 GMT 2004