Running autoPROC at a synchrotron is not different to running it in the home lab. However, a lot of synchrotrons have optimised their infrastructure for speed and efficiency when it comes to running data-processing jobs. So a few settings could help you getting the most out of autoPROC when using those resources.

Please make sure to check with your local IT contact for details about resources and usage policies.

Making use of compute clusters

As described in the XDS documentation, the spot-searching (COLSPOT) and integration (INTEGRATE) stages can be distributed to multiple compute nodes. Once this has been configured within the XDS installation, autoPROC can be told about this feature using the MAXIMUM_NUMBER_OF_JOBS keyword - e.g. with

% process autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 ...

Please make sure that all required packages are also available, configured and useable on such a compute cluster.

Making use of multiple CPUs/threads

By default, autoPROC will use all available threads for the XDS, XSCALE and AIMLESS steps. This can be controlled globally via e.g.

% process -nthreads 8 ...

or separately for XDS and XSCALE/AIMLESS. E.g. to use 16 threads for XDS but 32 for XSCALE/AIMLESS one would use:

% process autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 -nthreads 32 ...

The autoPROC XML file

autoPROC will write two ISPyB-compatible XML files at the end of processing (into the output directory):

  • autoPROC.xml describes the final isotropic dataset (usually truncate-unique.mtz)
  • autoPROC_staraniso.xml describes the final, anisotropically analysed dataset (usually staraniso_alldata-unique.mtz)

The default name can be changed via the autoPROC_CreateXmlFile parameter, e.g.

% process -d 01 ...                                           # 01/autoPROC.xml
% process ...                                                 # ./autoPROC.xml
% process -d 01 autoPROC_CreateXmlFile=`pwd`/01/ispyb.xml ... # 01/ispyb.xml

Releases after 13th Dec 2015 allow injection of your own XML elements, e.g. via

% process autoPROC_CreateXml_LocalElements="AutoProcContainer:AutoProcScalingContainer:AutoProcIntegrationContainer:Image:datasetID=12345" ...

which would insert <datasetID>12345</datasetID> into the



Presenting results to users

autoPROC provides a large amount of output - both in terms of files, but also in terms of annotated results that both a novice and experienced user should find useful. We invest a lot of effort to make the autoPROC output as understandable and educational as possible. It should not only help users to understand the quality of their final dataset better, but also to get helpful information and suggestions to maybe improve future experiments. Furthermore, it can sometimes provide indications of local issues with setup or instruments that beamline staff can use.

Therefore, we would like to have as much of autoPROC's "added value" visible to the user as technically possible. For that reason we provide a whole range of result files to pick from - typically:

  • summary.html is the main output file describing in detail the whole autoPROC process, including explanations, links to manual, all relevant plots, warning messages (about multiple lattices, ice-rings, unstable parameter refinement, overloads, ...), detailed anisotropy analysis via STARANISO and much more. Making this visible and available to the user would be our preferred option - if technical possible.
  • autoPROC.xml and autoPROC_staraniso.xml (providing ISPyB-compatible information about mainly the final data scaling/merging statistics)
  • truncate-unique.table1 and staraniso_alldata-unique.table1 (ASCII formatted table of final scaling/merging statistics - for overall, inner and outer resolution shells)
  • truncate-unique.stats and staraniso_alldata-unique.stats (ASCII formatted table of final scaling/merging statistics - as a function of resolution)
  • report.pdf and report_staraniso.pdf (PDF files with summary of final scaled/merged dataset - including several pages of plots and graphs)

If there are technical reasons why some of those results can't be presented: please let us know and we would try and work towards a solution.

Some suggested settings

By default, standard output from process will contain escape sequences to have bold or underlined text. If storing standard output is required (and this would be a good idea), then setting the environmental variable autoPROC_HIGHLIGHT to "no" will prevent this.

Some sites/beamlines might require specific settings (regarding header information, local coordinate convention, goniostat configurations etc). Some pre-defined settings might already be available within our distribution: please see

% process -M list

output for such "macros". You could write your own macro (see "process -M show" for examples) or check our database of known settings here. If you know of wrong or missing settings in those tables or have any other information regarding specific beamlies: please let us know.

Although the default parameters for running autoPROC are the result of processing a very large number of datasets over the years, some settings that would speed-up a job could be used - with the caveat that this means running autoPROC in non-default mode.

These settings could include

  • restricting the number of images to use for spot-searching, e.g. using 10 degree of images distributed over 4 ranges within the first 180 degree of data (released 14th Dec 2015):
    • Please be aware that this might significantly hamper autoPROC's ability to detect multiple lattices and ice-rings (and take corrective measures). By using such settings the user might not become aware of serious problems with the dataset.
  • restricting the number of pictures to produce showing the diffraction image and predictions:
    • This would only create those pictures at the final processing stage (caveat: the default settings would also create pictures when potential multiple lattices are analysed) and only for the first image (caveat: potentially missing poorer diffraction patterns that are visible only at different angles).

Accessing image files

Remember that data-processing will consist of a large amount of disk I/O, especially reading of image files. If this is accomodated in special ways at the synchrotron site, it should be taken advantage of. Specifically:

  • try accessing the images from the fastest location possible (if they are visible/stored on multiple filesystems)
  • avoid accessing compressed (*.gz or *.bz2) images: although XDS can handle them, it does so by uncompressing them on-the-fly each time an image is requested.
  • autoPROC can take advantage of the LIB= settings for XDS when reading images: just add the relevant autoPROC_XdsKeyword_LIB=/where/ever/somet/thing

Some proposed runs for automatic data-processing with autoPROC

Apart fom the specific details regarding distribution (across nodes and threads), some possible usages of autoPROC could be:

# all defaults:
% process ...

# explicitly assume anomalous signal:
% process -ANO ...

# explicitly assume no anomalous signal:
% process -noANO ...

# use XSCALE for scaling (instead of AIMLESS):
% process -M ScalingX ...

# use CC(1/2) as high-resolution criteria (instead of default I/sigI):
% process -M HighResCutOnCChalf ...

# in case of "poor" diffraction:
% process -M LowResOrTricky ...

# going for pure speed (with all the obvious caveats this entails):
% process -M fast ...

# with known SG:
% process symm=P21 ...

# with known SG and cell:
% process symm=P21 cell="34 45 56 90 98 90" ...

# with reference dataset available
% process -ref /where/ever/ref.mtz ...

Of course these can be combined. See also your local autoPROC reference card at


Some proposed options when providing re-processing capability for autoPROC

Apart from using autoPROC with (more or less) defaults on each sweep of data, it can easily accommodate a wide array of re-processing options - especially since it was designed for multi-sweep processing right from the start.

One default feature of autoPROC is to automatically combine differnet sweeps into a single, merged dataset if the wavelength value of those sweeps is identical. This is defined via the parameter


If the wavelength value written into the image header can change between sweeps, you might need to reset that criteria to something more appropriate for your given setup (especially what/how wavelength values are written into image headers), e.g.

% process WavelengthSignificantDigits=4 ...

The most common reasons for re-processing might be

Processing multiple sweeps from the same crystal together

This could be due to multi-wavelengths/MAD data, multiple orientations when using a multi-axis goniostat, (pseudo-)helical scans, interleaved data collection, inverse beam etc. If all images reside in the same directory, just using

% process -I /where/ever/images ...

will be fine. If data is in separate directories, one can use

% process -Id "A,/where/ever/scan1,test_####.cbf,1,900" \
          -Id "B,/where/ever/scan2,test_####.cbf,1,900" \

To handle multi-orientation data correctly, the instrument/goniostat description needs to be correct and up-to-date - see:

% process -M list


% x_kappa -list

for our currently distributed, beamline/instrument specific settings. If you are running a multi-axis instrument at your beamline, please contact us with updates and calibration datasets fro time to time!

Processing multiple sweeps from different crystals

Since the orientations of those datasets are unrelated, you need to run with

% process EnsureConsistentIndexing=no ...

This will avoid transforming orientation matrices between different sweeps according to some defined instrument/goniostat model.

Selecting subset of images

This can easily be done using e.g.

% process -Id "A,/where/ever/scan1,test_####.cbf,1,600" \
          -Id "B,/where/ever/scan2,test_####.cbf,201,900" \

If you want to exclude images in the middle of a sweep, you could run e.g.

% process -Id "A1,/where/ever/scan1,test_####.cbf,1,200" \
          -Id "A2,/where/ever/scan1,test_####.cbf,401,600" \
          -Id "B,/where/ever/scan2,test_####.cbf,201,900" \

However, this would treat potentially very small wedges as separate datasets. Another option would be to first create a directory with symbolic links to the images wanted

% mkdir tmpA
% ln -s /where/ever/scan1/test_0[01][0-9][0-9].cbf tmpA/.
% ln -s /where/ever/scan1/test_0200.cbf tmpA/.
% ln -s /where/ever/scan1/test_0[45][0-9][0-9].cbf tmpA/.
% rm tmpA/test_0400.cbf
% ln -s /where/ever/scan1/test_0600.cbf tmpA/.

and then run

% process -Id "A,`pwd`/tmpA,test_####.cbf,1,600" \
          -Id "B,/where/ever/scan2,test_####.cbf,201,900" \

Enforcing a given spacegroup or spacegroup and cell

This can be implemented via

% process symm=P21 ...


% process symm=P21 cell="40 50 45 90 90.3 90" ...

Special handling of (large signal) anomalous/dispersive signal datasets

To avoid classification of very large anomalous/dispersive differences being classified as outliers, one can run with

% process -ANO ExpectLargeHeavyAtomSignal=yes ...


% process -ANO ExpectLargeHeavyAtomSignal=yes ExpectLargeHeavyAtomSignalScaleAndMerge=yes ...

Example (for Diamond MX beamlines)

A script like


module load autoPROC
module load global/cluster

process -I /where/ever/images \
  -d autoPROC.dir | tee autoPROC.log

can be used to take advantage of the compute cluster (forkintegrate has been configured accordingly, the COLSPOT step is not configured for multi-node execution). Of course, any additional command-line arguments can also be added - e.g.


module load autoPROC
module load global/cluster

process -I /where/ever/images \
  symm=P6122 cell="93 93 130 90 90 120" \
  -d autoPROC.dir | tee autoPROC.log

Submission would then be done with

% qsub -pe smp 16 -cwd

(where is the above script).

External links to documents at synchrotrons/beamlines

Please note that not all of these pages will be up-to-date: developments at synchrotrons/beamlines usually move much faster than documentation can keep up with. If you notice any out-of-date (or non-existent) documentation during your beamline visit, please let us know - especially if you are in a position to help improving those documents.

Synchrotron Links
Australian Synchrotron