Content:
This tutorial will show the typical usage of autoPROC (Vonrhein et al, 2011) on a dataset of tetragonal Lysozyme - see (Faust et al, 2008) and http://www.embl-hamburg.de/xtutor/experiment-5_en.html.
The set of images need to be downloaded from
and unpacked with
% tar -xzvf exp5_data1.tgz % tar -xzvf exp5_data2.tgz % bunzip2 -v exp5/data/*.bz2
This should result in a subdirectory exp5/data containing 180 images exp5_lyso_ligands_1_001.img to exp5_lyso_ligands_1_180.img.
After changing to the directory containing the images with
% cd exp5/data
we just run
% process -d 01 | tee 01.lis
The –d argument will tell autoPROC to write all output into the subdirectory 01. We also want to capture the standard output into 01.lis. Please see
% process -h
for details about the command-line options and pointers to further documentation.
This runs through without any issues, giving
Cell parameters ...................................... 77.3809 77.3809 37.7408 90.000 90.000 90.000 Spacegroup number .................................... 92 Spacegroup name ...................................... P41212 Distance ............................................. 70.204872 Detector origin ...................................... 1547.149292 1543.589355
and a final dataset of
Overall InnerShell OuterShell Low resolution limit 77.381 77.381 1.669 High resolution limit 1.663 7.718 1.663 Rmerge 0.059 0.031 0.586 Ranom 0.057 0.028 0.561 Rmeas (within I+/I-) 0.062 0.031 0.625 Rmeas (all I+ & I-) 0.062 0.032 0.620 Rpim (within I+/I-) 0.023 0.012 0.271 Rpim (all I+ & I-) 0.017 0.010 0.196 Total number of observations 171554 1803 810 Total number unique 13699 181 88 Mean(I)/sd(I) 32.4 46.5 4.6 Completeness 98.0 100.0 70.4 Multiplicity 12.5 10.0 9.2 Anomalous completeness 97.7 100.0 68.7 Anomalous multiplicity 6.8 6.1 4.8
Not using all images of a given dataset is normally not a useful thing to do - unless you want to run this as a tutorial on especially slow machines.
There are two possibilities of using only a subset of images:
The first option requires a certain amount of copying, removing or creation of symbolic links. This can be confusing and time-consuming. So we will stick with the second option - using the –Id flag to the 'process' command. From process -h:
-Id <idN>,<dirN>,<templateN>,<fromN>,<toN> : to override automatic definition of identifiers/scans. Each identifier requires 5 items: <idN> = identifier string (no special characters, since directories/files might get created using this string) <dirN> = directory containing images <templateN> = filename template for images (using a series of '#'s as placeholder for image number) <fromN> = starting image number <toN> = final image number Template lines with this format can be generated by using the "find_images" tool and its "-l" flag.
A suitable command for using only images 61-120 would be:
% process -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -d 02 | tee 02.lis
(we're making use of the command `pwd` which evaluates to the current directory we're in - and where also the images are). This gives
Cell parameters ...................................... 77.3838 77.3838 37.7460 90.000 90.000 90.000 Spacegroup number .................................... 92 Spacegroup name ...................................... P41212 Distance ............................................. 70.205742 Detector origin ...................................... 1547.331787 1543.394287
and still a reasonably complete dataset (apart from the anomalous data):
Overall InnerShell OuterShell Low resolution limit 34.607 34.607 1.692 High resolution limit 1.686 7.796 1.686 Rmerge 0.056 0.026 0.596 Ranom 0.048 0.022 0.515 Rmeas (within I+/I-) 0.060 0.026 0.676 Rmeas (all I+ & I-) 0.064 0.029 0.690 Rpim (within I+/I-) 0.037 0.014 0.432 Rpim (all I+ & I-) 0.030 0.013 0.342 Total number of observations 55467 643 375 Total number unique 13101 147 100 Mean(I)/sd(I) 18.5 35.3 2.2 Completeness 97.5 87.5 74.6 Multiplicity 4.2 4.4 3.8 Anomalous completeness 89.1 71.8 59.8 Anomalous multiplicity 2.4 2.9 2.2
Following the explanations in the FAQ What can I do to have it run faster?, with
% process -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -M fast autoPROC_XdsKeyword_TRUSTED_REGION="0.0 1.05" -R 999.9 1.6 -d 03 | tee 03.lis
we get
Cell parameters ...................................... 77.3858 77.3858 37.7466 90.000 90.000 90.000 Spacegroup number .................................... 92 Spacegroup name ...................................... P41212 Distance ............................................. 70.208298 Detector origin ...................................... 1547.334595 1543.384033
and a dataset of
Overall InnerShell OuterShell Low resolution limit 34.608 34.608 1.726 High resolution limit 1.720 7.953 1.720 Rmerge 0.055 0.026 0.421 Ranom 0.046 0.022 0.338 Rmeas (within I+/I-) 0.059 0.026 0.434 Rmeas (all I+ & I-) 0.062 0.029 0.483 Rpim (within I+/I-) 0.036 0.014 0.268 Rpim (all I+ & I-) 0.029 0.013 0.231 Total number of observations 53662 619 372 Total number unique 12466 142 91 Mean(I)/sd(I) 19.4 35.4 3.0 Completeness 98.5 87.7 74.6 Multiplicity 4.3 4.4 4.1 Anomalous completeness 90.8 70.7 56.8 Anomalous multiplicity 2.4 2.9 2.3
Following the explanations at How does it fit into my existing project?, and saving the reflection file for PDB entry 1H87 as 1h87.mtz, we could run with
% process -ref 1h87.mtz -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -M fast autoPROC_XdsKeyword_TRUSTED_REGION="0.0 1.05" -R 999.9 1.6 -d 04 | tee 04.lis
which gives
Cell parameters ...................................... 77.3858 77.3858 37.7466 90.000 90.000 90.000 Spacegroup number .................................... 96 Spacegroup name ...................................... P43212 Distance ............................................. 70.208282 Detector origin ...................................... 1547.333252 1543.384033
and a dataset
Overall InnerShell OuterShell Low resolution limit 34.608 34.608 1.726 High resolution limit 1.720 7.953 1.720 Rmerge 0.055 0.026 0.421 Ranom 0.046 0.022 0.338 Rmeas (within I+/I-) 0.059 0.026 0.434 Rmeas (all I+ & I-) 0.062 0.029 0.484 Rpim (within I+/I-) 0.036 0.014 0.268 Rpim (all I+ & I-) 0.029 0.013 0.231 Total number of observations 53663 620 372 Total number unique 12465 142 91 Mean(I)/sd(I) 19.4 35.4 3.0 Completeness 98.5 87.7 74.6 Multiplicity 4.3 4.4 4.1 Anomalous completeness 91.8 64.0 64.9 Anomalous multiplicity 2.4 3.0 2.2
Very similar data quality - but now with the correct spacegroup P43212 - ie the correct screw axis component along the 4-fold.
Some recurring topics can be handled using more advanced features of autoPROC.
One of the most common causes for failures very early on is a problem with the beam centre values. As trivial as it might sound, it probably accounts for two-thirds of all autoPROC failures ... so be sure that you know if the values in the image header are correct and what convention they refer to. Most of the time, the beamline staff will know this.
It's always a good idea to visually analyse the initial diffraction image for a rough low-resolution limit - mainly based on the shape and extent of the beamstop shadow. Although most modern instruments will have a very clean beamstop (and therefore shadow), some special circumstances like very large unit cells and therefore very large detector-crystal distances, might leave a considerable area of the detector behind the beamstop shadow. During integration of predicted reflections, the lowest resolution reflections might be behind the beamstop shadow and therefore integrated wrongly.
Most integration programs will have some automatic (or semi-automatic) method to help defining the detector area behind the beamstop. autoPROC (when run with the default XDS path) will try and use the automatic DEFPIX stage of the processing pipeline, automatically adjusting parameters to effectively mask out the beamstop shadow. This often works adequately, but if there is special emphasis on the quality of the low-resolution data, some manual work might be required.
The easiest is to define an ellipse (via X,Y pixel coordinates) and provide them to autoPROC using the XDS keyword UNTRUSTED_ELLIPSE.
% process -ref 1h87.mtz -Id LysoHepes,`pwd`,exp5_lyso_ligands_1_###.img,61,120 -M fast autoPROC_XdsKeyword_TRUSTED_REGION="0.0 1.05" -R 999.9 1.6 autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1493 1588 1488 1588" -d 05 | tee 05.lis
The default criteria for high-resolution limits are a good starting point, but might need adjusting for very well and very poorly diffracting crystals - see eg. the Why don't I get the desired I/sigI value in the high resolution shell? FAQ entry.
(1) Vonrhein, C., Flensburg, C., Keller, P., Sharff, A., Smart, O., Paciorek, W., Womack, T. & Bricogne, G. (2011).Data processing and analysis with the autoPROC toolbox. Acta Cryst. D67, 293-302.
(2) Annette Faust, Santosh Panjikar, Uwe Muller, Venkataraman Parthasarathy, Andrea Schmidt, Victor Lamzin, and Manfred S. Weiss (2008). J. Appl. Cryst. 41, 1161-1172.