Content:


Introduction

This tutorial will show the typical usage of autoPROC (Vonrhein et al, 2011) on a dataset of tetragonal Lysozyme - see (Faust et al, 2008) and http://www.embl-hamburg.de/xtutor/experiment-5_en.html.


Preparation

The set of images need to be downloaded from

and unpacked with

% tar -xzvf exp5_data1.tgz
% tar -xzvf exp5_data2.tgz
% bunzip2 -v exp5/data/*.bz2

This should result in a subdirectory exp5/data containing 180 images exp5_lyso_ligands_1_001.img to exp5_lyso_ligands_1_180.img.


Running autoPROC - most simple case

After changing to the directory containing the images with

% cd exp5/data

we just run

% process -d 01 | tee 01.lis

The –d argument will tell autoPROC to write all output into the subdirectory 01. We also want to capture the standard output into 01.lis. Please see

% process -h

for details about the command-line options and pointers to further documentation.

This runs through without any issues, giving

Cell parameters ...................................... 77.3809 77.3809 37.7408 90.000 90.000 90.000
Spacegroup number .................................... 92
Spacegroup name ...................................... P41212
Distance ............................................. 70.204872
Detector origin ...................................... 1547.149292 1543.589355

and a final dataset of

                                              Overall  InnerShell  OuterShell
     Low resolution limit                      77.381      77.381       1.669
     High resolution limit                      1.663       7.718       1.663


     Rmerge                                     0.059       0.031       0.586
     Ranom                                      0.057       0.028       0.561
     Rmeas (within I+/I-)                       0.062       0.031       0.625
     Rmeas (all I+ & I-)                        0.062       0.032       0.620
     Rpim  (within I+/I-)                       0.023       0.012       0.271
     Rpim  (all I+ & I-)                        0.017       0.010       0.196
     Total number of observations              171554        1803         810
     Total number unique                        13699         181          88
     Mean(I)/sd(I)                               32.4        46.5         4.6
     Completeness                                98.0       100.0        70.4
     Multiplicity                                12.5        10.0         9.2


     Anomalous completeness                      97.7       100.0        68.7
     Anomalous multiplicity                       6.8         6.1         4.8

Using a subset of images with autoPROC

Not using all images of a given dataset is normally not a useful thing to do - unless you want to run this as a tutorial on especially slow machines.

There are two possibilities of using only a subset of images:

  • using a directory that contains only those images to be used
  • specifying explicitly the datasets

The first option requires a certain amount of copying, removing or creation of symbolic links. This can be confusing and time-consuming. So we will stick with the second option - using the –Id flag to the 'process' command. From process -h:

-Id <idN>,<dirN>,<templateN>,<fromN>,<toN> : to override automatic
                                definition of identifiers/scans. Each
                                identifier requires 5 items:

                                <idN>       = identifier string (no special
                                              characters, since directories/files
                                              might get created using this string)
                                <dirN>      = directory containing images
                                <templateN> = filename template for images (using
                                              a series of '#'s as placeholder for
                                              image number)
                                <fromN>     = starting image number
                                <toN>       = final image number

                                Template lines with this format can be
                                generated by using the "find_images"
                                tool and its "-l" flag.

A suitable command for using only images 61-120 would be:

% process -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -d 02 | tee 02.lis

(we're making use of the command `pwd` which evaluates to the current directory we're in - and where also the images are). This gives

Cell parameters ...................................... 77.3838 77.3838 37.7460 90.000 90.000 90.000
Spacegroup number .................................... 92
Spacegroup name ...................................... P41212
Distance ............................................. 70.205742
Detector origin ...................................... 1547.331787 1543.394287

and still a reasonably complete dataset (apart from the anomalous data):

                                              Overall  InnerShell  OuterShell
     Low resolution limit                      34.607      34.607       1.692
     High resolution limit                      1.686       7.796       1.686


     Rmerge                                     0.056       0.026       0.596
     Ranom                                      0.048       0.022       0.515
     Rmeas (within I+/I-)                       0.060       0.026       0.676
     Rmeas (all I+ & I-)                        0.064       0.029       0.690
     Rpim  (within I+/I-)                       0.037       0.014       0.432
     Rpim  (all I+ & I-)                        0.030       0.013       0.342
     Total number of observations               55467         643         375
     Total number unique                        13101         147         100
     Mean(I)/sd(I)                               18.5        35.3         2.2
     Completeness                                97.5        87.5        74.6
     Multiplicity                                 4.2         4.4         3.8


     Anomalous completeness                      89.1        71.8        59.8
     Anomalous multiplicity                       2.4         2.9         2.2

Adding some options to run faster

Following the explanations in the FAQ What can I do to have it run faster?, with

 % process -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -M fast autoPROC_XdsKeyword_TRUSTED_REGION="0.0 1.05" -R 999.9 1.6 -d 03 | tee 03.lis

we get

Cell parameters ...................................... 77.3858 77.3858 37.7466 90.000 90.000 90.000
Spacegroup number .................................... 92
Spacegroup name ...................................... P41212
Distance ............................................. 70.208298
Detector origin ...................................... 1547.334595 1543.384033

and a dataset of

                                              Overall  InnerShell  OuterShell
     Low resolution limit                      34.608      34.608       1.726
     High resolution limit                      1.720       7.953       1.720


     Rmerge                                     0.055       0.026       0.421
     Ranom                                      0.046       0.022       0.338
     Rmeas (within I+/I-)                       0.059       0.026       0.434
     Rmeas (all I+ & I-)                        0.062       0.029       0.483
     Rpim  (within I+/I-)                       0.036       0.014       0.268
     Rpim  (all I+ & I-)                        0.029       0.013       0.231
     Total number of observations               53662         619         372
     Total number unique                        12466         142          91
     Mean(I)/sd(I)                               19.4        35.4         3.0
     Completeness                                98.5        87.7        74.6
     Multiplicity                                 4.3         4.4         4.1


     Anomalous completeness                      90.8        70.7        56.8
     Anomalous multiplicity                       2.4         2.9         2.3

Using prior information

Following the explanations at How does it fit into my existing project?, and saving the reflection file for PDB entry 1H87 as 1h87.mtz, we could run with

% process -ref 1h87.mtz -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -M fast autoPROC_XdsKeyword_TRUSTED_REGION="0.0 1.05" -R 999.9 1.6 -d 04 | tee 04.lis

which gives

Cell parameters ...................................... 77.3858 77.3858 37.7466 90.000 90.000 90.000
Spacegroup number .................................... 96
Spacegroup name ...................................... P43212
Distance ............................................. 70.208282
Detector origin ...................................... 1547.333252 1543.384033

and a dataset

                                              Overall  InnerShell  OuterShell
     Low resolution limit                      34.608      34.608       1.726
     High resolution limit                      1.720       7.953       1.720


     Rmerge                                     0.055       0.026       0.421
     Ranom                                      0.046       0.022       0.338
     Rmeas (within I+/I-)                       0.059       0.026       0.434
     Rmeas (all I+ & I-)                        0.062       0.029       0.484
     Rpim  (within I+/I-)                       0.036       0.014       0.268
     Rpim  (all I+ & I-)                        0.029       0.013       0.231
     Total number of observations               53663         620         372
     Total number unique                        12465         142          91
     Mean(I)/sd(I)                               19.4        35.4         3.0
     Completeness                                98.5        87.7        74.6
     Multiplicity                                 4.3         4.4         4.1


     Anomalous completeness                      91.8        64.0        64.9
     Anomalous multiplicity                       2.4         3.0         2.2

Very similar data quality - but now with the correct spacegroup P43212 - ie the correct screw axis component along the 4-fold.


Advanced processing

Some recurring topics can be handled using more advanced features of autoPROC.

Direct beam centre

One of the most common causes for failures very early on is a problem with the beam centre values. As trivial as it might sound, it probably accounts for two-thirds of all autoPROC failures ... so be sure that you know if the values in the image header are correct and what convention they refer to. Most of the time, the beamline staff will know this.

Low-resolution

It's always a good idea to visually analyse the initial diffraction image for a rough low-resolution limit - mainly based on the shape and extent of the beamstop shadow. Although most modern instruments will have a very clean beamstop (and therefore shadow), some special circumstances like very large unit cells and therefore very large detector-crystal distances, might leave a considerable area of the detector behind the beamstop shadow. During integration of predicted reflections, the lowest resolution reflections might be behind the beamstop shadow and therefore integrated wrongly.

Most integration programs will have some automatic (or semi-automatic) method to help defining the detector area behind the beamstop. autoPROC (when run with the default XDS path) will try and use the automatic DEFPIX stage of the processing pipeline, automatically adjusting parameters to effectively mask out the beamstop shadow. This often works adequately, but if there is special emphasis on the quality of the low-resolution data, some manual work might be required.

The easiest is to define an ellipse (via X,Y pixel coordinates) and provide them to autoPROC using the XDS keyword UNTRUSTED_ELLIPSE.

% process -ref 1h87.mtz -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -M fast autoPROC_XdsKeyword_TRUSTED_REGION="0.0 1.05" -R 999.9 1.6 autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1493 1588 1488 1588" -d 05 | tee 05.lis

High-resolution

The default criteria for high-resolution limits are a good starting point, but might need adjusting for very well and very poorly diffracting crystals - see eg. the Why don't I get the desired I/sigI value in the high resolution shell? FAQ entry.


References

(1) Vonrhein, C., Flensburg, C., Keller, P., Sharff, A., Smart, O., Paciorek, W., Womack, T. & Bricogne, G. (2011).Data processing and analysis with the autoPROC toolbox. Acta Cryst. D67, 293-302.

(2) Annette Faust, Santosh Panjikar, Uwe Muller, Venkataraman Parthasarathy, Andrea Schmidt, Victor Lamzin, and Manfred S. Weiss (2008). J. Appl. Cryst. 41, 1161-1172.