Telling autoSHARP what to do
Copyright |
© 2001-2015 by Global Phasing Limited |
|
|
All rights reserved. |
|
|
This software is proprietary to and embodies the confidential
technology of Global Phasing Limited (GPhL). Possession, use,
duplication or dissemination of the software is authorised only
pursuant to a valid written licence from GPhL.
|
|
Documentation |
(2001-2015) Clemens Vonrhein |
|
Contact |
sharp-develop@GlobalPhasing.com |
This manual will help you telling autoSHARP what you want to do. Remember
that the files (reflexions, sequence, heavy atom positions) have to be
already in your sharpfiles/datafiles directory
before you start autoSHARP -
otherwise they won't be visible.
Note : This is intended for users running autoSHARP through the
Sushi interface - but might still be useful when running through the
CCP4i interface.
This is a (short) description of the various input fields in the main
autoSHARP control panel. Please
note that this part of the interface is not performing a lot of syntax
checks. So some notes:
- fields are either expecting text or numbers (integer or
float). Please do not use any special characters like _, &,
%, $,
[, < or similar.
- Make sure not to have white space (space or tabs) in your fields
- you can keep a text field empty - apart from the identifiers for
job and datasets.
- fields that expect numeric input allow a zero "0" as
default. Do not keep these empty.
The first question you are asked when entering the autoSHARP Control panel is about the
type of experiment. Currently, autoSHARP can handle SAD/MAD and
SIR(AS)/MIR(AS) cases. For more complicated scenarios (e.g. MAD +
native) you will have to use SHARP alone.
Note, that in a MIR case - because of the possibilities of origin
shifts and the uncertainty of handedness - not all derivatives can be
used right from the start in refinement. This might mean, that an
autoSHARP run with several
derivatives might not perform optimally. It is therefore recommended
to start with with SIR(AS) first. But you can still use autoSHARP to collect, scale and analyse
all you derivatives together. This case also expects a native dataset
(to be entered first).
autoSHARP can handle merged data - i.e. files
which contain only unique reflections - in MTZ, SCALEPACK
or SHELX HKLF 4
format. It can also work from unmerged data in SCALEPACK
"NOMERGE original index" format. In any case it is assumed that the
each dataset has been scaled internally (using e.g. SCALA, SCALEPACK,
XSCALE etc). If you are running a scenario that requires several
datasets (like MAD, SIR(AS) or MIR(AS)), the different datasets need
to be scaled relative to each other. This is done automatically in
autoSHARP. However, if this relative scaling
of several datasets was already done by the user it needs to be
explicitly specified by the user.
To define a dynamic set of defaults, the user can specify if this
autoSHARP run should be fast or
accurate (but slower) - or a compromise in between. For very
good data, the fastest option should probably give a correct
result. In difficult cases (or if a fast option didn't give the
results hoped for) it is recommended to use the most accurate settings.
This is just a short string to identify this run of autoSHARP. It should not contain special
characters, spaces, underscores, points etc. Keep it short, unique and
descriptive (e.g. "MAD" is short but not very imaginative -
"Lysozyme-MAD-X31" might be a bit better.).
This is a short description of what you're doing. It will be put in
all SHARP input files generated throughout the autoSHARP procedure as well as the main
LISTautoSHARP.html files. It helps you keeping track of what you've
done.
You need to tell autoSHARP what is in your
asymmetric unit. You can do this by either giving the molecular weight
(in Da) or the number of amino acid residues - only one of the two is
required. The given values are assumed to be for a single monomer
(even if you have several copies in the asymmetric unit). If you have
no idea what is in your asymmetric unit (strange) keep the values at
zero (autoSHARP will use a solvent content of
50% to calculate these values for you). So far we concentrated on
proteins alone - so the number of residues assumes amino acid
residues.
Obviously the best way of specifying these values is by using a
sequence file (see below).
To specify the contents of the asymmetric unit, the best way is to
give autoSHARP a file with the
sequence. This file has to be in your sharpfiles/datafiles directory with the
extension .pir. The format is
described in here.
If you have more than one molecule in the asymmetric unit and you
already know the exact number of copies, just cut-and-paste the same
sequence several times into the file. If you are unsure about the
number of molecules per asymmetric unit, just give a single copy of
the monomer sequence - and autoSHARP will
determine the most likely number of molecules itself.
This is usually not needed since the reflection files should contain
this information in the header. However, you might want to set it
manually - perhaps your input data is not taking screw axes into
account (since scaling/merging was done in P222 instead of
P212121). If left at the default, all
symmetry information will be taken from the input reflection file(s) -
if this is available in the relevant files.
Also note, that autoSHARP and SHARP assume
space group symbol H32 for the trigonal space
group R32 with hexagonal setting. Please use
unique-b settings for monoclinic space groups. The scripts should
detect any discrepancy automatically for you - but you might want to
check.
You have to specify how many heavy atoms of what type you expect to
find per monomer in your asymmetric unit. In principle it
shouldn't hurt to slightly
overestimate the number of heavy atoms you want autoSHARP to find (it adjust this internally
anyway). However, SHELXD (for heavy-atom detection) works best if this
number is close to the correct value - and if you know the number of
sites exactly you should already use this information here.
The program will adjust the number of sites to search for, whenever it
adjust the most likely number of molecules in the asymmetric
unit. Therefore it is important to give the number of sites per
monomer here. If you have a Se-MET protein, this number should be
known (also in cases of sulphur phasing). If this is a halide soak (I
or Br) you could expect to find maybe one site per 15-20 residues. For
more traditional soaks (Pb, Pt, Hg etc) it is probably much fewer. If
you want to stop autoSHARP from updating this
number when updating the number of monomers, give a negative number
(e.g. "-4" will always search for 4 sites, no matter how
many molecules autoSHARP thinks there are).
In the original version of autoSHARP one
could choose between different methods of finding the heavy atom
positions: the direct methods program RANTAN seemed to have an upper
limit of maybe 20 sites to search for. SHELXD on the other hand can
easily handle much larger number of heavy atoms. Again: if you want to
avoid the automatic update of the number of heavy atoms to search for
(which happens during the automatic determination of most probably
solvent content, i.e. number of molecules per asymmetric unit is
updated), please give the number of sites as a negative number
For ways to work with externally found heavy atom positions see below.
Currently, autoSHARP usually handles only a
single type of atom per dataset. But in most cases one of the heavy
atoms is most "visible" (in terms of signal) anyway. And
even if a found heavy atom position is occupied by the
"wrong" heavy atom, the refinement of heavy atom occupancy
should accommodate for this easily.
If you already know some or all heavy atom positions you can skip the
heavy atom detection step within autoSHARP
and supply your own set of positions. This is done by creating a
little file in your sharpfiles/datafiles
directory with the extension .hatom. The
possible formats are described in here.
You will still need to define the heavy atom types and number (see above). If you define a number greater then the
number of sites in your .hatom file, autoSHARP will try to complement these initial
sites by automatically analysing the various residual maps after the
first SHARP refinement.
Please note that for now only a single type of atom is supported: so
all atoms in this file should be of the same type. If you expect
various types of heavy atoms in your crystal, try to use a sensible
compromise or use the main type of atoms.
You can perform only a subset of the steps of automated structure
solution if you want to. autoSHARP
can do the following steps:
- Data analysis
This will just scale different datasets together (if not already
defined as "scaled" in the step before)
and perform various analysis steps on these datasets.
- step 1 + Heavy atom detection
Tries to find an initial set of heavy atom positions. This step
is skipped if you prefer to supply your own set of heavy atom
positions (see above).
- step 2 + Heavy atom refinement & phasing
Performs several steps of heavy atom refinement and generation of
phases (for both hands). autoSHARP will analyse the
residual maps
after the first refinement cycle to delete wrong sites and add
additional sites. Also, refinement of f''-values might be
switched on.
- step 3 + Density modification
Based on the two sets of phases (for the two hands) autoSHARP will try
to find the correct hand, to optimise the solvent content and to get
the best solvent flattened map.
- step 4 + Automatic model building
If the data extends to high enough resolution (better than 2.8
Å for ARP/wARP version 6,
otherwise 2.3 Å) and the
phases after solvent flattening seem to be of good
enough quality, the external ARP/wARP
program suite is called to do
some automatic map interpretation. If you supplied a sequence file
(see above) side chain docking is attempted.
If the ARP/wARP helix-building
module is installed, automatic building of helices will be done
even at lower resolution (up to 5 Å).
If a model is availabel (either some initial molecular-replacement
solution, a partially build model or something similar), it can be
selected here if it was placed into your sharpfiles/datafiles directory. This will
trigger the usage of those model phases in heavy-atom detection using
the LLG (residual) maps in SHARP with external phase information - a
feature that has been part of SHARP for a
very long time and is also sometimes referred to as "MR-SAD" (but here
it can be used in a more general way).
Only data within these resolution limits will be used throughout the
procedure. Sometimes it is good to use lower resolution only for heavy
atom detection. This might mean, that no automatic model building via
ARP/wARP is possible. However, after an initial lower resolution autoSHARP run you can always switch to the normal
SHARP procedure to then use all available resolution for further
refinement, phasing, density modification and finally automatic model
building. Or use the set of heavy atom sites found during this initial
run in a further autoSHARP run using all
resolution.
If there is some problematic low-resolution data (e.g. because the
beamstop wasn't handled correctly during data processing), it might be
helpful to exclude this too. However, to get the best electron density
map after solvent flattening, good (and complete!) low resolution data
is very beneficial.
A small and unique string (alphanumeric) to identify a given
dataset. The MTZ files generated by autoSHARP
will have this identifier to distinguish the various column. It is a
good idea for MAD datasets to use identifiers like "peak",
"infl", "hrem" and "lrem": incase the
automatic assignement (based on f' and f'' values) fails, these
strings are used to tell SHELXC the type of each dataset during the
heavy atom detection step.
For each dataset you have to specify the wavelength it was measured at
- either in Å or as f' and f'' values. Obviously, f' and f''
values from a good fluorescence scan is the best you can get. If far
away from the edge it is usually not that critical and you can either
just give the wavelength or use calculated/tabulated f' and f''
values. However, close to the edge, these calculated 9or tabulated)
values can be very wrong.
Currently, autoSHARP can use reflexion files
in MTZ, SCALEPACK or SHELX format (other merged
file format could easily be implemented). These files have to be in
your sharpfiles/datafiles directory before
you start autoSHARP (with extensions .mtz, .sca or .hkl respectively.
This is only necessary if your data file is in MTZ format. See the
documentation about MTZ
files for the columns autoSHARP expects.
This is only necessary if your data file is one of the unmerged
SCALEPACK formats (which don't carry the cell parameters in their
header - although the space group is recorded) or a SHELX file. Please
make sure to use the correct cell parameter for each dataset (if there
is more than one). It will be used when the input files are converted
into MTZ files.
For merged SCALEPACK files or MTZ files these values should probably
be left at their default (i.e. zeros in a, b and c) - unless the cell
parameters in the reflection file header needs to be changed.
Each step of the autoSHARP procedure has its
own set of default values. These are stored in $BDG_home/bin/sharp/autoSHARP/*.def files. In
general you don't need to be concerned with these, but if you want to
override/change them the following procedure is required:
- determine the Project identifier
- copy the relevant ".def" file into
sharpfiles/cardfiles/<project>.<step>.def
- edit sharpfiles/cardfiles/<project>.<step>.def
So if your project is "Lysozyme-MAD-X31" and you want to
change the defaults for the calculation of E values you need a file
sharpfiles/cardfiles/Lysozyme-MAD-X31.ecalc.def.
Obviously, if you find any values that worked better for your
particular project we would be very happy to hear about - so we can
accommodate for this..
Last modified: Tue Sep 30 10:54:46 CEST 2008