Basic notions about BUSTER
|
Copyright © 1995-2004 by | Eric Blanc, Pietro Roversi, Clemens
Vonrhein, |
|
Gérard Bricogne and the Buster Development Group. |
|
|
All rights reserved.
|
BUSTER uses maximum-likelihood (ML)
and maximum-entropy (ME) techniques to overcome two major
shortcomings encountered by classical methods (least-squares (LS) +
difference maps) when dealing with the refinement and completion of
partial structures:
- LS refinement is known to produce biased results, in which the
corrections to the initial partial structure are smaller than they
ought to be;
- difference maps after LS refinement tend to show an attenuated
and noisy image of the missing structure.
Recourse to ML instead of LS helps prevent overfitting the observed
amplitudes at phases too close to those of the initial fragment, by
keeping an appropriate distance from the data; while the filtering of
difference maps by prior knowledge of the localisation of missing
atoms and by the enforcement of a maximum-entropy condition helps
increase the signal/noise ratio of the final reconstruction of the
density for those missing atoms.
Both the ML and ME methods are based on a statistical treatment of
model structure factors by techniques which constitute the core of
BUSTER. Their purpose is to generate and
exploit quantitative descriptions of the statistical behaviour of
structure factors resulting from the two main sources of randomness
present in the typical situation described above:
- errors in the current fragment, i.e. the imperfection of the fragment;
- uncertainty arising from the fact that the atoms which are missing
from the fragment cannot be represented by definite atomic parameters
and must be treated as randomly distributed, i.e. the incompleteness of the fragment.
At any given stage of the refinement or completion process, model
structure factors do not have a "calculated value" as implied by the
usual notation Fcalc : instead, they have a
probability distribution. In practice these distributions are
often approximated by Gaussians, hence described in terms of the
expectation of any collection of random structure factors, and
by the covariance matrix of fluctuations around these expectations.
This statistical picture allows us to take into account the phase
uncertainty present in these model structure factors to drive the
refinement of the fragment. Instead of treating their phases as
constants when trying to improve the fit between their amplitudes and
the observed amplitudes, we calculate the marginal probability
distribution of model amplitudes and seek to maximise the value taken
by this marginal probability over the observed amplitudes. It is that
value which is called the likelihood of the
current model, and its maximisation with respect to all or any of the
parameters describing the current model is called the ML
refinement of those parameters. Unlike with the LS method, the
initial probability distribution for the model structure factors may
contain an explicit dependence on parameters which influence the
variance of the distribution, and such parameters may be refined along
with others. It is through such refinable variance-modulating
parameters that the ML method is able to keep a safe distance between
observed amplitudes and the amplitudes of the traditional
Fcalc's, and thus avoid overfitting. Experimental
information on the phases attached to the observed amplitudes can
further assist in this bias removal.
The ML refinement of the fragment (in conjunction with TNT) and its ME completion are naturally associated
in this formalism, in the sense that the probability distribution of
the model structure factors - and hence the likelihood Lambda of the current model -
depends symmetrically on the atomic parameters (xyzB) describing the
current fragment and on other parameters (the "Lagrange multipliers")
describing the extra detail currently being introduced into the
distribution of the missing atoms by the ME method. Since the model
structure factors are sums of contributions from the
- fragment,
- randomly-distributed missing atoms and
- solvent,
we see that the gradient of the log-likelihood
L = log (Lambda) with respect
to the expectations of model structure factors can be redirected (by
the chain rule) either towards the atomic parameters on which the
fragment contributions depend, or towards the Lagrange multipliers on
which the random-atom contribution depends, or towards both.
Last modification: 26.01.04