This tutorial illustrates the use of BUSTER for the early stages of refinement of a model after rigid-rody refinement of a Molecular Replacement solution.
The structure of the complex between the porcine pepsin ("PP", 326 residues) and its major protein inhibitor ("PI-3", 149 residues) from the Ascaris suum worm was solved by molecular replacement by Ng et al. in 2000 at 2.4 Å using search models from the native structures of the pepsin and the PI-3 inhibitor[1]. The structure was deposited with PDB accession code 1F34.
Early efforts to solve the structure of the PP_PI-3 complex by phasing with the Molecular Replacement solution for PP alone were not successful: this incomplete model for PP represents only 42% of the complex. To give an example of the difficult early stages of refinement of a severely incomplete MR solution, we present here the BUSTER refinement of the same incomplete PP MR search model, straight after rigid-body refinement.
There are three separate steps in this job:
A few parameters need some attention: see the simple quick input guide
| With bulk solvent: the lowest-resolution model amplitudes are too low with respect to the observed ones: the bulk solvent mask based on PP alone is too large, and the masking effect too pronounced. Still, this is the best we can do with the BUSTER code as it is now. See on the side the effects of omitting the bulk solvent model altogether. |
No bulk solvent: the
model F's(from a different BUSTER calculation ) would
be too strong with respect to the
observed F's at low-resolution: the masking of the
partial structure F's would be effected by the flat regions of the
missing atoms envelope only. Omitting the bulk solvent mask would not
be a good choice.
Notice also that the absence of a bulk solvent model bends the scaling, and the values of <Fobs> on absolute scale as plotted here are be quite different from the ones in the plot aside (the scale factor of this calculation comes out as 1.3 instead of 1.1) |
| The R values at
after the first round of scaling, at cycle 0 are around 40%.
During refinement (cycles 1-10) the R factors decrease; the final working-set and free-set Rfactors do not differ by less than one percent. The Maximum Entropy completion (cycles 11-13) overfits but manages to reduce the Rfree nevertheless. |
The plot of Log-Likelihood gain vs. cycle number
shows an increase of the LLG, starting from the initial value of 0;
The working-set LLG is bound to increase because the refinement is driven by maximising the likelihood of the model with respect to the working set; more important is to check that the free-set LLG increases as well. Again, as observed with R factors, some overfitting is introduced but a small improvement in the free LLG is evident. |
The (Fobs,Ffrag) and (Fobs,Fcalc) curves only differ at low resolution because in Fcalc there are contributions from the solvent and the low-resolution envelope for the missing atoms, while in Ffrag only the atoms in the PDB model for the partial structure are taken into account.
Most importantly, the (Fcalc,Fexpct) curve depends on the imperfection parameters that parameterise the BUSTER internal error model. The larger the internal estimate for the error on the calculated F, the more this CC curve departs from unity. A comparison between the (Fcalc,Fexpct) and the (Fobs,Fcalc) correlation coefficients curves can inform as to the adequacy of the BUSTER internal error model: if the latter is correct, after the first cycle the two curves should be close to one another.
The (Fobs,Fobs+d) curve is a measure of the noise on the data vs. resolution. This correlation coefficient is lower than unity when the noise on the data becomes large (typically at high resolution, where the I/s(I) is lowest).
|
Cycle 0: At the outset, you can see that the ``observed correlation''
(Fobs,Fcalc) roughly matches the
``predicted correlation''
(Fcalc,Fexpct).
The initial error model is adequate.
|
Cycle 10: At the end of the last cycle of refinement the
high-resolution CC values are
closer to unity than they were at the beginning:
this is due to the improvement of the model for the partial
structure. If we were to continue for longer, the CC values would
only ever so slightly improve. This usually indicates that the partial structure
contains errors which the refinement is unable to correct. Manual rebuilding is
in order.
The imperfection Bfactor for the partial structure at the end of refinement has a value around 7.9 (see curves below): the ``predicted correlation'' (Fcalc,Fexpct) is in good agreement with the ``observed'' one, (Fobs,Fcalc). Only the very low resolution has worsened, because the solvent model is still very inadequate and its associated error model has increased (see below). |
|
Cycle 13: At the end of the Maximum Entropy calculation,
the fit to the working-set low-resolution amplitudes is very good,
but as seen above there is some overfitting.
No improvement is brought about to the missing atoms model past the 3.0 Å limit because no Lagrange multipliers were used for those data. The (Fobs,Ffrag) CC curve remains unchanged because no changes are made to the partial structure. The imperfection parameters for the missing atoms model and the bulk solvent (see curve on the side) keep the error model so large that the ``predicted correlation'' (Fcalc,Fexpct) is lower than the observed (Fobs,Fcalc): but recalling the values of the free likelihood gain during Maximum Entropy completion, this ''pessimistic'' error model is probably correct. |
B imperfection factors refinement: the plot of the imperfection B factors
can be used to confirm that the partial structure is improving during
refinement and that the error model becomes smaller.
As refinement progresses, the value of the imperfection B for the partial structure should decrease. In this case Bimpf frag levels off to a value around 7.9. Again, this is a sign that the refinement is unable to correct the error in the partial structure: manual rebuilding is in order. At the same time, the variance is taken up by the Bimpf solv the measure of the imperfection of the bulk solvent model. |
If you inspect the 2Fo-Fc and Fo-Fc maps, together with the deposited PDB model for the PP_PI-3 complex, you notice how parts of the PI-3 missing inhibitor are already visible (e.g.the B1-B13 strand; the poly-Pro helix B139-B143; parts of the long helix B97-B120). It is also apparent that PP needs rebuilding in many points (e.g. serines A49, A68, A72, A294, A207; Glu A70)