BUSTER User Manual previous next
Chapter 1

Basic notions about BUSTER

Copyright © 1995-2004 by   Eric Blanc, Pietro Roversi, Clemens Vonrhein,
Gérard Bricogne and the Buster Development Group.
All rights reserved.


BUSTER uses maximum-likelihood (ML) and maximum-entropy (ME) techniques to overcome two major shortcomings encountered by classical methods (least-squares (LS) + difference maps) when dealing with the refinement and completion of partial structures: Recourse to ML instead of LS helps prevent overfitting the observed amplitudes at phases too close to those of the initial fragment, by keeping an appropriate distance from the data; while the filtering of difference maps by prior knowledge of the localisation of missing atoms and by the enforcement of a maximum-entropy condition helps increase the signal/noise ratio of the final reconstruction of the density for those missing atoms.

Both the ML and ME methods are based on a statistical treatment of model structure factors by techniques which constitute the core of BUSTER. Their purpose is to generate and exploit quantitative descriptions of the statistical behaviour of structure factors resulting from the two main sources of randomness present in the typical situation described above:

At any given stage of the refinement or completion process, model structure factors do not have a "calculated value" as implied by the usual notation Fcalc : instead, they have a probability distribution. In practice these distributions are often approximated by Gaussians, hence described in terms of the expectation of any collection of random structure factors, and by the covariance matrix of fluctuations around these expectations. This statistical picture allows us to take into account the phase uncertainty present in these model structure factors to drive the refinement of the fragment. Instead of treating their phases as constants when trying to improve the fit between their amplitudes and the observed amplitudes, we calculate the marginal probability distribution of model amplitudes and seek to maximise the value taken by this marginal probability over the observed amplitudes. It is that value which is called the likelihood of the current model, and its maximisation with respect to all or any of the parameters describing the current model is called the ML refinement of those parameters. Unlike with the LS method, the initial probability distribution for the model structure factors may contain an explicit dependence on parameters which influence the variance of the distribution, and such parameters may be refined along with others. It is through such refinable variance-modulating parameters that the ML method is able to keep a safe distance between observed amplitudes and the amplitudes of the traditional Fcalc's, and thus avoid overfitting. Experimental information on the phases attached to the observed amplitudes can further assist in this bias removal.

The ML refinement of the fragment (in conjunction with TNT) and its ME completion are naturally associated in this formalism, in the sense that the probability distribution of the model structure factors - and hence the likelihood Lambda of the current model - depends symmetrically on the atomic parameters (xyzB) describing the current fragment and on other parameters (the "Lagrange multipliers") describing the extra detail currently being introduced into the distribution of the missing atoms by the ME method. Since the model structure factors are sums of contributions from the

  1. fragment,
  2. randomly-distributed missing atoms and
  3. solvent,
we see that the gradient of the log-likelihood L = log (Lambda) with respect to the expectations of model structure factors can be redirected (by the chain rule) either towards the atomic parameters on which the fragment contributions depend, or towards the Lagrange multipliers on which the random-atom contribution depends, or towards both.

 


Last modification: 26.01.04