[sharp-discuss] question on mir phasing
Clemens Vonrhein
vonrhein at globalphasing.com
Fri May 28 10:23:41 CEST 2010
Hi Francis,
On Thu, May 27, 2010 at 01:18:34PM -0600, Francis E Reyes wrote:
> The reason I ask is because a lot of people emphasize low resolution
> data, but I can sympathize with the OP. One screens diffraction images,
> collects as high resolution as possible, and can (usually) solve the
> structure.
I can also sympathize - but (there's always a but):
- "collects as high resolution as possible"
==> Why? What is the (sensible) reason for that? Maybe:
* you need the high resolution to see enough detail for some
important biological question to be answered
* you want more reflections so refinement becomes easier
(parameter/observation ratio)
* a high-resolution limit of 1.94 just sounds nicer than 2.8A (or:
"The high-resolution structure of ..." is a better title for a
paper) ... ;-)
Collecting high resolution data in itself is neither good nor bad -
it's what you need the data for that determines the usefulness of
it.
- "and can (usually) solve the structure."
That is something quite different: how was it solved? With
molecular replacement or experimental phasing (or combination)? Or
maybe direct methods?
Some general and simplified rules:
MR needs complete low resolution data (remember that you try to
place a large 'blob' through rotation/translation - this is a low
resolution problem).
Experimental methods need accurate data (to have good
anomalous/isomorphous/dispersive differences). One of the main
steps here is density modification - and e.g. solvent flattening
or histogram matching like to distinguish the protein region from
the solvent region ... which again is a low resolutoin problem
(to find the blob of protein versus the fussy solvent region).
Direct methods like very high resolution data.
If you use direct methods (SHELXD, HySS, SnB etc) to locate your
heavy atom substructure, you will also use normalised structure
factors (of differences). To have accurately determined E values,
you need complete resolution bins everywhere - or at least fairly
complete with a _random_ set of reflections missing (so that the
average in that bin stays the same).
However (and this is the important bit): when we're talking about
low completeness in the low resolution shells we are looking at two
effects.
1) reflections that can't be measured because of beamstop
This will affect all kind of decisions in the same way that
having very high resolution data will determine if you can use
anisotropic B-factor refinement, model alternative
conformations etc.
With missing low resolution you might not be able to solve a
structure with MR: the information about location and
orientation is mainly included in those low-resolution data.
You won't be able to do density modifcation efficiently
(solvent flattening, NCS averagin etc).
Bulk solvent correction in refinement will be very unstable
since you're missing the data that would influence the bulk
solvent model parametrisation: so your model (PDB file plus
bulk solvent) will be in error relative to the reality of your
crystal from which you collected your data. This can have all
kind of funny effects: ripples at the surface of the protein,
difficulties in modeling solvent structure at surface, weird
connectivity problems in otherwise perfect density etc.
2) overloaded reflections
This is a non-random set of reflections that are missing: not
just any (say) 30% of reflections in the lowest resolution
shell, but the 30% strongest ones. So we have a systematic
error here - with the same effects as above (unable to match
model/parametrisation with physical reality).
And for experimental phasing: those strongest reflections
would have been measured/integrated most accurately (stron,
high I/sigI values) and therefore would have given you the
most accurate and largest difference values for substructure
solution and phasing ... so one is missing the most valuable
bits of data.
I'd rather throw away 1000 high-resolution reflections than
missing the 20 strongest low-resolution reflections due to
overloads ;-)
> The structure that I collected on during Rapidata (which was
> solved)
Congratulations!
> has a low resolution completeness (as judged by phenix.xtriage)
> from 28.6 - 10.61 A of about 68.7%.
Which I would classify as very problematic: the beamstop probably
restricts you to 30A (which is ok) and the 30% missing reflections are
all overloads.
> However, 100% in all bins up to the
> resolution limit of 2.0A.
2A is nice for refinement - but for structure solution (either MR or
HA phasing) I'd rather have a 2.5-2.8A dataset with the low resolution
range more or less complete. And by lowering the dose (no overloads)
and moving the detector away (smaller missing region behind beamstop)
you could get 95% complete data in the 40-10A range maybe?
Yes, it is annoying to maybe having to collect two datasets: one with
complete low-resolution data for solving the structure and another
high-resolution one for refinement ... but that might just be 2 trips
to the synchrotron instead of half a dozen where the first step
(structure solution) failed because of low-resolution issues.
> By default HKL2000 does not output the low resolution bins (the
> first bin in this dataset was 50-5A and in the scale log its at a
> 100% completion).
I don't know HKL2000, but I'm sure you could run scalepack with a
resolution setting of 30-4.0 to get that statistic? Otherwise, SFTOOLS
is very good at this (you can specify the exact number of bins to
use).
> Is it only when you have issues that it becomes valuable? (poor
> anomalous signal, etc etc)
Turn that around: if you don't have that (valuable) good low
resolution data it becomes an issue ... every time. Yes, you might get
around it because the structure is 'just' a compound soak in the same
spacegroup (so no structure solution required) and your biological
question (did it bind?) can ignore all those messy features like
breaks in main-chain density or noisy solvent boundary.
> P.S. the Reply-To: field for sharp-discuss defaults to the sender, it
> would be useful for the replies to head back to the list so those of us
> who are simply 'watching' can listen in and hopefully learn something.
Yes(ish): the logic here is that we don't want people accidentially
reply to the whole group if they only want to reply to a single
person. I think that is the same default as e.g. for CCP4bb? One might
always use the group-reply feature of the email client to get the
reply to list and sender ...
Cheers
Clemens
--
***************************************************************
* Clemens Vonrhein, Ph.D. vonrhein AT GlobalPhasing DOT com
*
* Global Phasing Ltd.
* Sheraton House, Castle Park
* Cambridge CB3 0AX, UK
*--------------------------------------------------------------
* BUSTER Development Group (http://www.globalphasing.com)
***************************************************************
More information about the sharp-discuss
mailing list