autoSHARP User Manual previous next
Chapter 2

Telling autoSHARP what to do

Copyright    © 2001-2015 by Global Phasing Limited
 
  All rights reserved.
 
  This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.
 
Documentation    (2001-2015)  Clemens Vonrhein
 
Contact sharp-develop@GlobalPhasing.com


This manual will help you telling autoSHARP what you want to do. Remember that the files (reflexions, sequence, heavy atom positions) have to be already in your sharpfiles/datafiles directory before you start autoSHARP - otherwise they won't be visible.

Note : This is intended for users running autoSHARP through the Sushi interface - but might still be useful when running through the CCP4i interface.



Main autoSHARP Control Panel

This is a (short) description of the various input fields in the main autoSHARP control panel. Please note that this part of the interface is not performing a lot of syntax checks. So some notes:


Preparation

Type of experiment

The first question you are asked when entering the autoSHARP Control panel is about the type of experiment. Currently, autoSHARP can handle SAD/MAD and SIR(AS)/MIR(AS) cases. For more complicated scenarios (e.g. MAD + native) you will have to use SHARP alone.

Note, that in a MIR case - because of the possibilities of origin shifts and the uncertainty of handedness - not all derivatives can be used right from the start in refinement. This might mean, that an autoSHARP run with several derivatives might not perform optimally. It is therefore recommended to start with with SIR(AS) first. But you can still use autoSHARP to collect, scale and analyse all you derivatives together. This case also expects a native dataset (to be entered first).

Entry point

autoSHARP can handle merged data - i.e. files which contain only unique reflections - in MTZ, SCALEPACK or SHELX HKLF 4 format. It can also work from unmerged data in SCALEPACK "NOMERGE original index" format. In any case it is assumed that the each dataset has been scaled internally (using e.g. SCALA, SCALEPACK, XSCALE etc). If you are running a scenario that requires several datasets (like MAD, SIR(AS) or MIR(AS)), the different datasets need to be scaled relative to each other. This is done automatically in autoSHARP. However, if this relative scaling of several datasets was already done by the user it needs to be explicitly specified by the user.

Speed/Accuracy

To define a dynamic set of defaults, the user can specify if this autoSHARP run should be fast or accurate (but slower) - or a compromise in between. For very good data, the fastest option should probably give a correct result. In difficult cases (or if a fast option didn't give the results hoped for) it is recommended to use the most accurate settings.


Project identifier

This is just a short string to identify this run of autoSHARP. It should not contain special characters, spaces, underscores, points etc. Keep it short, unique and descriptive (e.g. "MAD" is short but not very imaginative - "Lysozyme-MAD-X31" might be a bit better.).


Title

This is a short description of what you're doing. It will be put in all SHARP input files generated throughout the autoSHARP procedure as well as the main LISTautoSHARP.html files. It helps you keeping track of what you've done.


Molecular weight, number of residues

You need to tell autoSHARP what is in your asymmetric unit. You can do this by either giving the molecular weight (in Da) or the number of amino acid residues - only one of the two is required. The given values are assumed to be for a single monomer (even if you have several copies in the asymmetric unit). If you have no idea what is in your asymmetric unit (strange) keep the values at zero (autoSHARP will use a solvent content of 50% to calculate these values for you). So far we concentrated on proteins alone - so the number of residues assumes amino acid residues.

Obviously the best way of specifying these values is by using a sequence file (see below).


Sequence

To specify the contents of the asymmetric unit, the best way is to give autoSHARP a file with the sequence. This file has to be in your sharpfiles/datafiles directory with the extension .pir. The format is described in here.

If you have more than one molecule in the asymmetric unit and you already know the exact number of copies, just cut-and-paste the same sequence several times into the file. If you are unsure about the number of molecules per asymmetric unit, just give a single copy of the monomer sequence - and autoSHARP will determine the most likely number of molecules itself.


Space group

This is usually not needed since the reflection files should contain this information in the header. However, you might want to set it manually - perhaps your input data is not taking screw axes into account (since scaling/merging was done in P222 instead of P212121). If left at the default, all symmetry information will be taken from the input reflection file(s) - if this is available in the relevant files.

Also note, that autoSHARP and SHARP assume space group symbol H32 for the trigonal space group R32 with hexagonal setting. Please use unique-b settings for monoclinic space groups. The scripts should detect any discrepancy automatically for you - but you might want to check.


Heavy atom types and number

You have to specify how many heavy atoms of what type you expect to find per monomer in your asymmetric unit. In principle it shouldn't hurt to slightly overestimate the number of heavy atoms you want autoSHARP to find (it adjust this internally anyway). However, SHELXD (for heavy-atom detection) works best if this number is close to the correct value - and if you know the number of sites exactly you should already use this information here.

The program will adjust the number of sites to search for, whenever it adjust the most likely number of molecules in the asymmetric unit. Therefore it is important to give the number of sites per monomer here. If you have a Se-MET protein, this number should be known (also in cases of sulphur phasing). If this is a halide soak (I or Br) you could expect to find maybe one site per 15-20 residues. For more traditional soaks (Pb, Pt, Hg etc) it is probably much fewer. If you want to stop autoSHARP from updating this number when updating the number of monomers, give a negative number (e.g. "-4" will always search for 4 sites, no matter how many molecules autoSHARP thinks there are).

In the original version of autoSHARP one could choose between different methods of finding the heavy atom positions: the direct methods program RANTAN seemed to have an upper limit of maybe 20 sites to search for. SHELXD on the other hand can easily handle much larger number of heavy atoms. Again: if you want to avoid the automatic update of the number of heavy atoms to search for (which happens during the automatic determination of most probably solvent content, i.e. number of molecules per asymmetric unit is updated), please give the number of sites as a negative number

For ways to work with externally found heavy atom positions see below.

Currently, autoSHARP usually handles only a single type of atom per dataset. But in most cases one of the heavy atoms is most "visible" (in terms of signal) anyway. And even if a found heavy atom position is occupied by the "wrong" heavy atom, the refinement of heavy atom occupancy should accommodate for this easily.


Known heavy atom positions

If you already know some or all heavy atom positions you can skip the heavy atom detection step within autoSHARP and supply your own set of positions. This is done by creating a little file in your sharpfiles/datafiles directory with the extension .hatom. The possible formats are described in here.

You will still need to define the heavy atom types and number (see above). If you define a number greater then the number of sites in your .hatom file, autoSHARP will try to complement these initial sites by automatically analysing the various residual maps after the first SHARP refinement.

Please note that for now only a single type of atom is supported: so all atoms in this file should be of the same type. If you expect various types of heavy atoms in your crystal, try to use a sensible compromise or use the main type of atoms.


What to do?

You can perform only a subset of the steps of automated structure solution if you want to. autoSHARP can do the following steps:
  1. Data analysis

    This will just scale different datasets together (if not already defined as "scaled" in the step before) and perform various analysis steps on these datasets.

  2. step 1 + Heavy atom detection

    Tries to find an initial set of heavy atom positions. This step is skipped if you prefer to supply your own set of heavy atom positions (see above).

  3. step 2 + Heavy atom refinement & phasing

    Performs several steps of heavy atom refinement and generation of phases (for both hands). autoSHARP will analyse the residual maps after the first refinement cycle to delete wrong sites and add additional sites. Also, refinement of f''-values might be switched on.

  4. step 3 + Density modification

    Based on the two sets of phases (for the two hands) autoSHARP will try to find the correct hand, to optimise the solvent content and to get the best solvent flattened map.

  5. step 4 + Automatic model building

    If the data extends to high enough resolution (better than 2.8 Å for ARP/wARP version 6, otherwise 2.3 Å) and the phases after solvent flattening seem to be of good enough quality, the external ARP/wARP program suite is called to do some automatic map interpretation. If you supplied a sequence file (see above) side chain docking is attempted.

    If the ARP/wARP helix-building module is installed, automatic building of helices will be done even at lower resolution (up to 5 Å).


Placed (partial) model

If a model is availabel (either some initial molecular-replacement solution, a partially build model or something similar), it can be selected here if it was placed into your sharpfiles/datafiles directory. This will trigger the usage of those model phases in heavy-atom detection using the LLG (residual) maps in SHARP with external phase information - a feature that has been part of SHARP for a very long time and is also sometimes referred to as "MR-SAD" (but here it can be used in a more general way).


Resolution limits

Only data within these resolution limits will be used throughout the procedure. Sometimes it is good to use lower resolution only for heavy atom detection. This might mean, that no automatic model building via ARP/wARP is possible. However, after an initial lower resolution autoSHARP run you can always switch to the normal SHARP procedure to then use all available resolution for further refinement, phasing, density modification and finally automatic model building. Or use the set of heavy atom sites found during this initial run in a further autoSHARP run using all resolution.

If there is some problematic low-resolution data (e.g. because the beamstop wasn't handled correctly during data processing), it might be helpful to exclude this too. However, to get the best electron density map after solvent flattening, good (and complete!) low resolution data is very beneficial.


Dataset identifier

A small and unique string (alphanumeric) to identify a given dataset. The MTZ files generated by autoSHARP will have this identifier to distinguish the various column. It is a good idea for MAD datasets to use identifiers like "peak", "infl", "hrem" and "lrem": incase the automatic assignement (based on f' and f'' values) fails, these strings are used to tell SHELXC the type of each dataset during the heavy atom detection step.


Wavelength, f', f''

For each dataset you have to specify the wavelength it was measured at - either in Å or as f' and f'' values. Obviously, f' and f'' values from a good fluorescence scan is the best you can get. If far away from the edge it is usually not that critical and you can either just give the wavelength or use calculated/tabulated f' and f'' values. However, close to the edge, these calculated 9or tabulated) values can be very wrong.


Datafile

Currently, autoSHARP can use reflexion files in MTZ, SCALEPACK or SHELX format (other merged file format could easily be implemented). These files have to be in your sharpfiles/datafiles directory before you start autoSHARP (with extensions .mtz, .sca or .hkl respectively.


Column labels

This is only necessary if your data file is in MTZ format. See the documentation about MTZ files for the columns autoSHARP expects.


Cell parameters

This is only necessary if your data file is one of the unmerged SCALEPACK formats (which don't carry the cell parameters in their header - although the space group is recorded) or a SHELX file. Please make sure to use the correct cell parameter for each dataset (if there is more than one). It will be used when the input files are converted into MTZ files.

For merged SCALEPACK files or MTZ files these values should probably be left at their default (i.e. zeros in a, b and c) - unless the cell parameters in the reflection file header needs to be changed.


Advanced settings

Each step of the autoSHARP procedure has its own set of default values. These are stored in $BDG_home/bin/sharp/autoSHARP/*.def files. In general you don't need to be concerned with these, but if you want to override/change them the following procedure is required:
  1. determine the Project identifier
  2. copy the relevant ".def" file into sharpfiles/cardfiles/<project>.<step>.def
  3. edit sharpfiles/cardfiles/<project>.<step>.def
So if your project is "Lysozyme-MAD-X31" and you want to change the defaults for the calculation of E values you need a file sharpfiles/cardfiles/Lysozyme-MAD-X31.ecalc.def.

Obviously, if you find any values that worked better for your particular project we would be very happy to hear about - so we can accommodate for this..


Last modified: Tue Sep 30 10:54:46 CEST 2008