Content:

See also the online manual and the beamline-specific info.


Data processing

The main command for running the full autoPROC steps is process which will give a help message when run as

% process -h

Several pre-defined macros are available that could be useful - see

% process -M list
% process -M show

for details. Remember:

  • The main output file will be called "summary.html" (in the output directory specified with the –d flag): here you will find all information about the processing stages, potential problems, space group determination, anomalous signal, merging statistics, anisotropy, diffraction limits and what final reflection data files are available for subsequent steps.
  • The beamline-specific info might give you important information for processing data from specific beamlines - especially when the meta data (in image headers or HDF5) are not sufficient enough to describe the whole experiment. A common case would be the need to using ReverseRotationAxis=yes for data from some beamlines.

Simple case

In a directory with a set of images (from a single crystal and data-collection - but could be from several sweeps or wavelengths), just run

% process -d 01 | tee 01.lis

The –d flag means that all output is being written into the (automatically created) subdirectory 01. It is a good idea to also save standard output (in a file with a name similar to the subdirectory with the full output), since the most important results and a summary of the steps performed is printed there. Other possibilities are eg.

% process -d 01 > 01.lis 2>&1          # bash/zsh/ksh/sh
 - or -
% process -d 01 >& 01.lis              # csh/tcsh
 - or -
% process -d 01 | tee 01.lis

If your images are located in a different directory, just use the –I flag as

% process -I /where/ever/data/images -d 01 | tee 01.lis

When processing datasets in HDF5 format, you can also use

% process -h5 /where/ever/your_master.h5 -d 01 | tee 01.lis

Complete control

If you want to specify several datasets in different locations or want to explicitly select a subset of images:

process -Id sweep1,/where/ever/Images,sweep1_1_####.cbf,1,900 \
        -Id sweep2,/some/place/Images,sweep2_3_####.cbf,1,1440 \
        -d 01 | tee 01.lis

NOTE: the backslash character ("\") only denotes that this command should be given in a single line.


Scaling

The scaling module in autoPROC - which is used automatically as part of an autoPROC run - is called aP_scale. Online help is available through

% aP_scale -h

To make the most of the scaling and merging step, a basic understanding of the concept of project, crystal and dataset hierarchy (as implemented in the CCP4 MTZ libraries) is required.

Simple scaling of single scan

Here we can just take the (unmerged) reflection file out ot autoPROC and re-run the scaling module:

% aP_scale -mtz XDS_ASCII.mtz -P Lyso test A -b 1-360 -id 01 | tee 01_aP_scale.log

The –P flag will set the Project, Crystal and Dataset value for all following image selections - until a new –P flag changes it.

The –b flag selects a range of images (or batches, here image 1 to 360) from the full list of images present in the input MTZ file. One can use a larger range to make life easier, e.g. -b 1-9999 would also work if there are truly only 360 images in the dataset. One can also select a subset of images, if e.g. the crystal suffered serious radiation damage and the images towards the end of data-collection should be excluded in scaling/merging.

To get a unisue identifier for all output files, the value given with the –id flag will be used as a prefix for all generated files.

Advanced scaling of single scan

We've already seen how to select only a subset of images - but sometimes one might want to distinguish the images used for scaling from the images used for merging.

Say one expects radiation damage and looks for a way of visualising this. One way (if multiplicity permits) is to scale all data together but merge the first half of the dataset and the second half separately. If one then calculates a difference Fourier map (F_late - F_early), it could show potential results of radiation damage at the atomic level.

The relevant command would look like this:

% aP_scale -mtz XDS_ASCII.mtz \
    -P Lyso test early -b 1-180 \
    -P Lyso test late  -b 181-360 \
    -id 02 | tee 02_aP_scale.log

resulting in two files 02_truncate_early-unique.mtz and 02_truncate_late-unique.mtz.

Combining several scans

Often, data has been collected in multiple scans, e.g. as a low- and high-resolution (or -intensity) scan. First we need to combine the two files with

% combine_files -f 01/XDS_ASCII.mtz -P Lyso test lowreas \
                -f 02/XDS_ASCII.mtz -P Lyso test highres

It is useful to follow the correct logic in terms of data hierarchy (Project, Crystal and Dataset) - the tool will ensure that the resulting output MTZ file has the correct cell parameters set. It will also ensure that the image (batch) numbers are kept at a safe distance from each other (to avoid duplication) - usually an offset of 1000 is applied.

When doing the scaling/merging of such a 2-scan dataset, we want to combine both scans into a single, merged dataset with:

% aP_scale -mtz sortmtz.mtz \
    -P Lyso test both -b 1001-1090 -b 2001-2180 -id 03 | tee 03_aP_scale.log
  • the two scans have starting image numbers 1001 and 2001, respectively
  • only one –P flag is needed: this will apply to both –b selections that follow

Simple scaling of multiple scans

If the multiple scans should be scaled together (increasing also multiplicity during the outlier rejection stage) but merged separately, very similar steps are required:

% combine_files -f peak/XDS_ASCII.mtz -P Lyso test peak \
                -f infl/XDS_ASCII.mtz -P Lyso test infl \
                -f hrem/XDS_ASCII.mtz -P Lyso test hrem

The only difference to the previous syntax is to now use different –P flags before each selection argument (–b). It is the Dataset value that determines which sets of images are going to be merged together: anything with the same Dataset value will be merged together.

% aP_scale -mtz sortmtz \
    -P Lyso test peak -b 1001-1999 \
    -P Lyso test infl -b 2001-2999 \
    -P Lyso test hrem -b 3001-3999 \
    -id 04 | tee 04_aP_scale.log

Advanced scaling of multiple scans

It might be useful, to first check if several scans are reasonable isomorphous to each other. For this one could use the check_indexing tool:

% check_indexing -v peak/truncate.mtz infl/truncate.mtz hrem/truncate.mtz

to get simple R-value and CC statistics. This is especially useful when the different scans come from different crystals (or from different parts of the same crystal). Based on these statistics it might be easier to decide which scans to combine first.

After running the combine_files tool, some additional settings for the scaling module aP_scale might be of interest - see also the aP_scale -h output. For poor datasets it might be necessary to define the scaling model by hand, eg. with

% aP_scale -scale "ROTATION SPACING 5.0 BFACTOR OFF" ...

or

% aP_scale -scale "BATCH" ...

To scale and merge scans from multiple crystals together, using first

% aP_scale -scale "CONSTANT" -mtz sortmtz.mtz \
           -P Lyso A peak -b 1001-1999 \
           -P Lyso B peak -b 2001-2999 \
           -P Lyso C peak -b 3001-3999 \
           -id 05 | tee 05_aP_scale.log

could be used to use only a single scale for each scan. If this shows that the scans (from three crystals A, B and C in the example above) are isomorphous enough, one could switch eg to

% aP_scale -scale "ROTATION SPACING 5.0 BFACTOR ON" -mtz sortmtz.mtz \
           -P Lyso A peak -b 1001-1999 \
           -P Lyso B peak -b 2001-2999 \
           -P Lyso C peak -b 3001-3999 \
           -id 06 | tee 06_aP_scale.log

for a more detailed scaling model.