Content:


Introduction

Most of the explicit commands given throughout this tutorial assume a sh-like shell (bash, zsh, ksh etc). There are given in the form

% ls -l

If you are using tcsh (or csh), please adapt the examples accordingly in case there is a sntax difference.


3ISY - 2-wvl MAD

  • Images are in
~/GlobalPhasing/data/autoproc/3isy/infl
~/GlobalPhasing/data/autoproc/3isy/hrem
  • data was collected at beamline BL9-2 SSRL on 2009-05-14
  • two wavelengths were collected interleaved (30 images at one wavelength, then changing wavelength to collect the same 30 degrees etc)
    • Wavelength = 0.91162A, f'/f" = -1.8,3.3 (hrem)
            Distance: 275.01
            BeamCenter X: 162.5
            BeamCenter Y: 162.5
            PhiDelta: 1.00
            PhiStart: 322.00
            FrameNumber start: 1
            FrameNumber end: 90
    • Wavelength = 0.97934A, f'/f" = -11,3.3 (infl)
            Distance: 275.01
            BeamCenter X: 162.5
            BeamCenter Y: 162.5
            PhiDelta: 1.00
            PhiStart: 322.00
            FrameNumber start: 1
            FrameNumber end: 90

1. Running autoPROC

Ideally one should run this tutorial in an empty directory, e.g. with

% mkdir -p ~/3isy/autoPROC
% cd ~/3isy/autoPROC

1.1. Copying files over and some initial checks

Then we want to copy the images over, which can be done with

% ln -s ~/GlobalPhasing/data/autoproc/3isy/*/*mccd .

Note: jobs will run faster if the images are located on a local disk. Often, a users HOME directory is a network filesystem - so this might not be a good location (not only sloing this autoPROC job down, but also everyone else that is using that same disk). If there is a large local disk available (e.g. /scratch or /tmp), we would recommend unpacking the files into a directory on that local filesystem. One can check the available size e.g. with

% df -h /tmp
% df -h /scratch

Once we have the images, we can have a closer look at some basic information about them. For that we have the tool imginfo that reads the header information and produces a consistent output format for those items:

% imginfo *001.mccd

which returns

################# File = 120482_1_E1_001.mccd

>>> Image format detected as MARCCD

===== Header information:
date                                = 14 May 2009 02:15:38
exposure time             [seconds] = 2.000
distance                       [mm] = 275.006
wavelength                      [A] = 0.979340
Phi-angle (start, end)     [degree] = 322.000 323.000
Oscillation-angle in Phi   [degree] = 1.000
Omega-angle                [degree] = 0.000
Chi-angle                  [degree] = 0.000
2-Theta angle              [degree] = 0.000
Pixel size in X                [mm] = 0.079346
Pixel size in Y                [mm] = 0.079346
Number of pixels in X               = 4096
Number of pixels in Y               = 4096
Beam centre in X               [mm] = 162.500
Beam centre in X            [pixel] = 2047.992
Beam centre in Y               [mm] = 162.500
Beam centre in Y            [pixel] = 2047.992
Overload value                      = 65535

################# File = 120482_1_E2_001.mccd

>>> Image format detected as MARCCD

===== Header information:
date                                = 14 May 2009 02:21:51
exposure time             [seconds] = 2.000
distance                       [mm] = 275.006
wavelength                      [A] = 0.911620
Phi-angle (start, end)     [degree] = 322.000 323.000
Oscillation-angle in Phi   [degree] = 1.000
Omega-angle                [degree] = 0.000
Chi-angle                  [degree] = 0.000
2-Theta angle              [degree] = 0.000
Pixel size in X                [mm] = 0.079346
Pixel size in Y                [mm] = 0.079346
Number of pixels in X               = 4096
Number of pixels in Y               = 4096
Beam centre in X               [mm] = 162.500
Beam centre in X            [pixel] = 2047.992
Beam centre in Y               [mm] = 162.500
Beam centre in Y            [pixel] = 2047.992
Overload value                      = 65535

What we can see from that:

  • the two wavelengths (0.97934 and 0.91162) were collected at the same distance, starting angle and with the same exposure time
  • the beam centre is recorded as exactly the mid-point of the image: this is slightly unusual, unless the detector is extremely well aligned on the beamline.

To get a detailed record of the sequence of data-collection we need to use the timestamp in the image header (the timestamp on the file could easily be messed up through copying):

% imginfo -v *.mccd | awk '/File/{f=$NF}/Epoch/{print f,$NF}' | sort -n -k 2

which returns the list of images sorted by time (epoch, i.e. seconds since 01.01.1970). A slightly shortened output looks like this:

120482_1_E1_001.mccd 1242267338
120482_1_E1_002.mccd 1242267349
120482_1_E1_003.mccd 1242267360
...
120482_1_E1_028.mccd 1242267637
120482_1_E1_029.mccd 1242267648
120482_1_E1_030.mccd 1242267659
120482_1_E2_001.mccd 1242267711
120482_1_E2_002.mccd 1242267723
120482_1_E2_003.mccd 1242267736
...
120482_1_E2_028.mccd 1242268057
120482_1_E2_029.mccd 1242268070
120482_1_E2_030.mccd 1242268083
120482_1_E1_031.mccd 1242268133
...
120482_1_E1_060.mccd 1242268456
120482_1_E2_031.mccd 1242268507
...
120482_1_E2_060.mccd 1242268875
120482_1_E1_061.mccd 1242268926
...
120482_1_E1_090.mccd 1242269249
120482_1_E2_061.mccd 1242269302
...
120482_1_E2_090.mccd 1242269668

This shows nicely the collection pattern for an interleaved wavelength scan.

1.2. Running with all defaults

The easiest is to run a command like

% process -d 01 >01.lis 2>&1              # sh/ksh/bash/zsh

or (if one uses csh/tcsh as shell)

% process -d 01 >& 01.lis                 # csh/tcsh
  • The –d flag tells autoPROC to write all output into a subdirectory. This is a good idea in case one wants to run different jobs with slightly adjusted parameters or other options. Using a dual digit numbering will cater for really difficult cases where more than 9 jobs are needed.
  • It is always a good idea to save the standard output of a autoPROC job: it contains a summary of what was done in form of notes, tables, results and (possible) warnings. Going through this output explains what is happening and often can give good indications on how to improve the automatic processing results. The two examples above show the different syntax (depending on your shell) to save standard output as well as standard error in the same file.

1.3. Telling autoPROC where to find images

The above example (with all default settings) assumes that the images are in the same directory the 'process' command is started in. If this is not the case, two mechanisms are provided:

  • giving an explicit directory with a set of images:
      % process -I /dir/where/images/are -d 01 > 01.lis 2>&1
  • giving one (or several) detailed descriptions of datasets to process (see process -h for details):
      % process -Id lowRes,/data/images1,lyso_###.img,1,90 -Id highRes,/data/images2,lyso_high_###.img,1,180 -d 01 > 01.lis 2>&1

1.4. Working with multiple scans/datasets

The basic assumption for a single autoPROC run (using the process command) is that all images used in that run have a clearly defined relation in terms of orientation. That means that those cases work:

  • high and low resolution scan
  • multi-wavelength MAD experiment
  • translated crystal between datasets (to accomodate for radiation damage)
  • re-oriented crystal using a Kappa- or Eulerian-goniostat

whereas those wont't:

  • unmounted the crystal and remounted it again in a different, random position
  • datasets from several crystals

Furthermore, the relation between different scans need to be clearly defined through

  • image headers (values of goniostat angles, distance, wavelength etc):
    • the information in recent image headers is usually accurate and unique
    • some image header formats allow adding custom items that are either not known to autoPROC or out-of-sync with the standard header item (where only the custom item contains the correct value and the standard item is wrong)
    • some setups produce wrong header information and rely on customized input scripts or site-definition files in a processing software specific way. Often this is hidden from the user through some automated procedure, making the processing of images at the beamline an easy experience and processing in the home lab a very frustrating one
    • beam centre values are nearly always correct in the image header (apart from cases where the correct information is in a separate, software-specific site-definition file). However, the convention of the (X,Y) coordinate system used to describe the beam centre is never given in the image header. This is one of the main reasons why processing might fail very early on (but there are easy to use tools inside autoPROC to work around this in most cases).
  • a description of the instrument used (single-axis or multi-axis goniostat, 2-theta arm).
    • autoPROC can handle any kind of Kappa- or Eulerian-goniostat, as long as all rotation axes are defined in a consistent frame of reference.

2. Interpreting autoPROC output

Running

% process -d 01 > 01.lis 2>&1

will process all found images (in the current directory) with a reasonable set of defaults (we hope). There are two modifications that might be of interest:

  • running in 'fast' mode
        % process -M fast -d 02 > 02.lis 2>&1
    • this will cut down mainly on the number of images used for spot finding
    • however, the biggest impacts on speed is the use of a modern, powerful multi-core machine and having the images to be processed on a (fast) local filesystem.
    • the danger here is that some of the useful analysis regarding multiple lattices, ice-rings, low-resolution limits etc are very unreliable.
  • running with additional automation
        % process -M automatic -d 03 > 03.lis 2>&1
    • several of the default analysis steps will automatically trigger possible automatic decision making
    • this includes earlier resolution cutoffs, ice-ring exclusion and analysis of multiple lattices
  • It is recommended to always start with a normal run. If speed is an issue and the "–M fast" option is used, it is advisable to re-run the processing again without this flag.

2.1. Setup

The first thing you'll see is some information about the way autoPROC was run (list of command-line arguments). There is a fairly lengthy paragraph about the beam centre: since this is very often the main reason for a failing processing run, it is explicitely mentioned here again.

trim_autoPROC_3isy_01_70pc.png

A list of found scans is presented (image identifiers and range of images that make up each scan). In this example here, only one scan is present (peak wavelength of a Se-MET MAD experiment).

2.2. Spot search

The first step consists of finding spots on a set of images. The default is to search for spots on all images available. Although this might seem excessive (and often is required for getting a successful indexing), there are various analysis steps that work much more reliably if a larger number of images were used for spot search. These include:

  • detection of multiple lattices
  • changes of cell parameters due to radiation damage
  • ice-ring detection
  • low-resolution limit determination

trim_autoPROC_3isy_02_70pc.png

In this case, autoPROC decided that the initial indexing solution was not quite good enough to be used directly - mainly because the solution doesn't predict all of the found spots. A procedure for improving that initial solution will be used (run_idxref tool), which can e.g. detect multiple lattices or ice-rings.

trim_autoPROC_3isy_03_70pc.png

Using the list of indexed spots, some predictions of likely low- and high-resolution limits can be made as well.

trim_autoPROC_3isy_04_70pc.png

2.3. Indexing

trim_autoPROC_3isy_05_70pc.png

2.4. Visualising

One can already see, that P1 might not be the finally correct spacegroup (two cell axes are nearly identical and the angles are very close to 90 degree). However, to do a final assignment of the most likely space group it is better to have a full set of integrated intensities available - so the following integration is till run in spacegroup P1.

If the user is very confident about the space group and cell, autoPROC can be run with

% process cell="74 74 132 90 90 90" symm="I4122" -d 02 > 02.lis 2>&1

At this point, a little text file is prepared to allow visualisation of the current orientation matrix (and therefore predictions) using MOSFLM.

trim_autoPROC_3isy_06_70pc.png

This can be very useful to check if everything is working correctly - especially since XDS doesn't have a viewer to show (and modify) predictions:

% ipmosflm < 01/120482_1_E1_0/index_view.dat

This will start the (old) MOSFLM interface - so make sure to have a binary that still supports this interface.

mosflm_01_small.png

After hitting the 'Predict' button, the predictions given the current orientation matrix are shown: these should superimpose on actual spots . Ideally, all spots should have a prediction box around them: this depends on an accurate estimate of mosaicity though. After the initial indexing this value isn't yet available, but can be estimated from within MOSFLM:

mosflm_02_small.png

Some additional tools within this MOSFLM interface could be useful, e.g.

  • reading different images (to check if the predictions are also good for later images and how the diffraction looks at different points of data collection)
  • drawing circles to show resolution limits (to check up to which resolution there are actually spots visible)

mosflm_03_small.png

2.5. First integration

After the first integration (by default in P1), some additional plots are available for inspection:

trim_autoPROC_3isy_08_70pc.png

% display 01/120482_1_E1_0/*.png

1_scale.png1_mosaicity.png1_standard_deviation.png 1_distance.png1_detector_center_origin.png1_cell_axes.png 1_cell_angles.png

An updated file for visualisation using MOSFLM is also written.

2.6. Space group determination

Since by default the initial integration was done in P1, POINTLESS is now used to determine the most likely space group:

trim_autoPROC_3isy_09_70pc.png

This step can obviously not analyse screw axes if no (or insufficient) reflections along that axis were collected. Also, a distinction between enantiomeric space groups (P41212 versus P43212) is not possible at that stage - for that the structure solution step is usually required (density modification in case of experimental phasing or molecular replacement in both possibilities).

Using the determined space group, the final post-refinement step is repeated:

trim_autoPROC_3isy_10_70pc.png

2.7. Final processing

Now that the (hopefully) correct space group is know, the integration is repeated with these settings. At the end an updated table of statistics, a summary and an updated visualisation file are produced:

trim_autoPROC_3isy_11_70pc.png trim_autoPROC_3isy_12_70pc.png

2.8. Scaling and merging

Now that a set of integrated intensities is available, they need to be scaled and merged with an appropriate high-resolution cut being applied. For that we use the SCALA program (through the "aP_scale" tool). The scaling step can also be run by hand using the "aP_scale" tool - for more help just run

% aP_scale -h

trim_autoPROC_3isy_13_70pc.png

To see some detailed information about the scaling step, the CCP4 "loggraph" utility can be used - e.g.

% loggraph 01/120482_1_E1_0/scala.log

trim_autoPROC_3isy_14_70pc.png

2.9. Combined scaling/merging

Since this dataset consists of two wavelengths, the same process as above is now repeated for the second wavelengths. At the end, the reflections from both scans are combined:

trim_autoPROC_3isy_16_70pc.png

The two wavlengths are then scaled together (improving scale determination as well as outlier detection):

trim_autoPROC_3isy_17_70pc.png trim_autoPROC_3isy_18_70pc.png

Some additional information regarding the different runs/scans (e.g. images/batches that might have slightly higher Rmerge values) is also given:

trim_autoPROC_3isy_19_70pc.png

3.0. Finalising

The merged intensities are then converted into amplitudes (and anomalous differences) with the CCP4 "truncate" program:

trim_autoPROC_3isy_20_70pc.png

Details can be seen using

% loggraph 01/truncate_0.91162A.log
% loggraph 01/truncate_0.97934A.log

The final set of results will consist of

  • a SCALEPACK-formatted, merged reflection file for each wavelength - to be used in autoSHARP: 01/scala_0.91162A.sca and 01/scala_0.97934A.sca
  • a MTZ file for each scan to be used in refinement with BUSTER: 01/truncate_0.91162A-unique.mtz and 01/truncate_0.97934A-unique.mtz
  • a text file with a REMARK 200 section for deposition: 01/120482_1_E1_0/remark200.pdb and 01/120482_1_E2_0/remark200.pdb