Content:


Introduction

Running autoPROC at a synchrotron is not different to running it in the home lab. However, a lot of synchrotrons have optimised their infrastructure for speed and efficiency when it comes to running data-processing jobs. So a few settings could help you getting the most out of autoPROC when using those resources.

Please make sure to check with your local IT contact for details about resources and usage policies.


Making use of compute clusters

As described in the XDS documentation, the spot-searching (COLSPOT) and integration (INTEGRATE) stages can be distributed to multiple compute nodes. Once this has been configured within the XDS installation, autoPROC can be told about this feature using the MAXIMUM_NUMBER_OF_JOBS keyword - e.g. with

% process autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 ...

Please make sure that all required packages are also available, configured and useable on such a compute cluster.


Making use of multiple CPUs/threads

By default, autoPROC will use all available threads for the XDS, XSCALE and AIMLESS steps. This can be controlled globally via e.g.

% process -nthreads 8 ...

or separately for XDS and XSCALE/AIMLESS. E.g. to use 16 threads for XDS but 32 for XSCALE/AIMLESS one would use:

% process autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 -nthreads 32 ...
 

The autoPROC XML file

autoPROC will write an ISPyB-compatible XML file when running

% process -xml ...
  - or -
% process autoPROC_CreateXml=yes ...

The default name is "autoPROC.xml" within the output directory. The default name can be changed via the autoPROC_CreateXmlFile parameter, e.g.

% process -xml -d 01 ...                                           # 01/autoPROC.xml
% process -xml ...                                                 # ./autoPROC.xml
% process -xml -d 01 autoPROC_CreateXmlFile=`pwd`/01/ispyb.xml     # 01/ispyb.xml

Releases after 13th Dec 2015 allow injection of your own XML elements, e.g. via

% process autoPROC_CreateXml_LocalElements="AutoProcContainer:AutoProcScalingContainer:AutoProcIntegrationContainer:Image:datasetID=12345" ...

which would insert <datasetID>12345</datasetID> into the

<AutoProcContainer>
  <AutoProcScalingContainer>
    <AutoProcIntegrationContainer>
      <Image>

hierarchy.


Some suggested settings

By default, standard output from process will contain escape sequences to have bold or underlined text. If storing standard output is required (and this would be a good idea), then setting the environmental variable autoPROC_HIGHLIGHT to "no" will prevent this.

Some sites/beamlines might require specific settings (regarding header information, local coordinate convention, goniostat configurations etc). Some pre-defined settings might already be available within our distribution: please see

% process -M list

output for such "macros". You could write your own macro (see "process -M show" for examples) or check our database of known settings here. If you know of wrong or missing settings in those tables or have any other information regarding specific beamlies: please let us know.

Although the default parameters for running autoPROC are the result of processing a very large number of datasets over the years, some settings that would speed-up a job could be used - with the caveat that this means running autoPROC in non-default mode.

These settings could include

  • restricting the number of images to use for spot-searching, e.g. using 10 degree of images distributed over 4 ranges within the first 180 degree of data (released 14th Dec 2015):
      XdsSpotSearchNumImagesAngularRange="10.0"
      XdsSpotSearchNumRanges=4
      XdsSpotSearchAngularRange=180
    • Please be aware that this might significantly hamper autoPROC's ability to detect multiple lattices and ice-rings (and take corrective measures). By using such settings the user might not become aware of serious problems with the dataset.
  • restricting the number of pictures to produce showing the diffraction image and predictions:
      autoPROC_CreateGpxPicturesAtRotationAngles="0"
      autoPROC_CreateGpxPicturesAtStages="process"
    • This would only create those pictures at the final processing stage (caveat: the default settings would also create pictures when potential multiple lattices are analysed) and only for the first image (caveat: potentially missing poorer diffraction patterns that are visible only at different angles).

Accessing image files

Remember that data-processing will consist of a large amount of disk I/O, especially reading of image files. If this is accomodated in special ways at the synchrotron site, it should be taken advantage of. Specifically:

  • try accessing the images from the fastest location possible (if they are visible/stored on multiple filesystems)
  • avoid accessing compressed (*.gz or *.bz2) images: although XDS can handle them, it does so by uncompressing them on-the-fly each time an image is requested.

Some proposed runs for automatic data-processing with autoPROC

Apart fom the specific details regarding distribution (across nodes and threads), some possible usages of autoPROC could be:

# all defaults:
% process ...

# assume anomalous signal:
% process -ANO ...

# assume no anomalous signal:
% process -noANO ...

# use XSCALE for scaling (instead of AIMLESS):
% process -M ScalingX ...

# use CC(1/2) as high-resolution criteria (instead of default I/sigI):
% process -M HighResCutOnCChalf ...

# in case of "poor" diffraction:
% process -M LowResOrTricky ...

# going for pure speed (with all the obvious caveats this entails):
% process -M fast ...

# with known SG:
% process symm=P21 ...

# with known SG and cell:
% process symm=P21 cell="34 45 56 90 98 90" ...

# with reference dataset available
% process -ref /where/ever/ref.mtz ...

Of course these can be combined.


Example (for Diamond MX beamlines)

A script like

#!/bin/sh

module load autoPROC
module load global/cluster

process -I /where/ever/images \
  autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 \
  autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 \
  -d autoPROC.dir | tee autoPROC.log

can be used to take advantage of the compute cluster (forkintegrate has been configured accordingly, the COLSPOT step is not configured for multi-node execution). Of course, any additional command-line arguments can also be added - e.g.

#!/bin/sh

module load autoPROC
module load global/cluster

process -I /where/ever/images \
  symm=P6122 cell="93 93 130 90 90 120" \
  autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 \
  autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 \
  -d autoPROC.dir | tee autoPROC.log

Submission would then be done with

% qsub -pe smp 16 -cwd run.sh

(where run.sh is the above script).


External links to documents at synchrotrons/beamlines

Please note that not all of these pages will be up-to-date: developments at synchrotrons/beamlines usually move much faster than documentation can keep up with. If you notice any out-of-date (or non-existent) documentation during your beamline visit, please let us know - especially if you are in a position to help improving those documents.

Synchrotron Links
ESRF http://www.esrf.eu/UsersAndScience/Experiments/MX/How_to_use_our_beamlines/Run_Your_Experiment/autoproc-global-phasing
http://www.esrf.eu/UsersAndScience/Experiments/MX/Software/PXSOFT
http://www.esrf.eu/UsersAndScience/Experiments/MX/How_to_use_our_beamlines/Run_Your_Experiment/automatic-data-processing
SLS https://www.psi.ch/sls/pxi/status
https://www.psi.ch/sls/pxii/pxii-manual
https://www.psi.ch/sls/pxii/status
https://www.psi.ch/sls/pxiii/data-processing-and-analysis
Diamond http://www.diamond.ac.uk/Beamlines/Mx/Ixx/Ixx-Manual/Data-Analysis/Automated-Software-Pipeline.html
http://www.diamond.ac.uk/Beamlines/Mx/Ixx/Ixx-Manual/Data-Analysis/Processing-Files.html
http://www.diamond.ac.uk/Beamlines/Mx/Ixx/Ixx-Manual/Data-Analysis/MX-Software.html
http://www.diamond.ac.uk/Beamlines/Mx/I24.html
http://www.diamond.ac.uk/Beamlines/Mx.html
ALS http://bl1231.als.lbl.gov/xtalprogs/xtalprogs.php
http://bl831.als.lbl.gov/~gmeigs/links/links.html
http://www.mbc-als.org/manual.html
Soleil http://www.synchrotron-soleil.fr/Recherche/LignesLumiere/PROXIMA1/UserInfo
PETRA-III http://www.embl-hamburg.de/services/mx/software/
http://www.embl-hamburg.de/SoftwareManuals/#data
Australian Synchrotron http://www.synchrotron.org.au/aussyncbeamlines/macromolecular-crystallography/faqs-mx-beamlines
APS https://ls-cat.org/links.html
SSRL http://smb.slac.stanford.edu/facilities/software/xtal_software/