Running autoPROC at a synchrotron is not different to running it in the home lab. However, a lot of synchrotrons have optimised their infrastructure for speed and efficiency when it comes to running data-processing jobs. So a few settings could help you getting the most out of autoPROC when using those resources.
Please make sure to check with your local IT contact for details about resources and usage policies.
As described in the XDS documentation, the spot-searching (COLSPOT) and integration (INTEGRATE) stages can be distributed to multiple compute nodes. Once this has been configured within the XDS installation, autoPROC can be told about this feature using the MAXIMUM_NUMBER_OF_JOBS keyword - e.g. with
% process autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 ...
Please make sure that all required packages are also available, configured and useable on such a compute cluster.
By default, autoPROC will use all available threads for the XDS, XSCALE and AIMLESS steps. This can be controlled globally via e.g.
% process -nthreads 8 ...
or separately for XDS and XSCALE/AIMLESS. E.g. to use 16 threads for XDS but 32 for XSCALE/AIMLESS one would use:
% process autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 -nthreads 32 ...
autoPROC will write two ISPyB-compatible XML files at the end of processing (into the output directory):
The default name can be changed via the autoPROC_CreateXmlFile parameter, e.g.
% process -d 01 ... # 01/autoPROC.xml % process ... # ./autoPROC.xml % process -d 01 autoPROC_CreateXmlFile=`pwd`/01/ispyb.xml ... # 01/ispyb.xml
Releases after 13th Dec 2015 allow injection of your own XML elements, e.g. via
% process autoPROC_CreateXml_LocalElements="AutoProcContainer:AutoProcScalingContainer:AutoProcIntegrationContainer:Image:datasetID=12345" ...
which would insert <datasetID>12345</datasetID> into the
<AutoProcContainer> <AutoProcScalingContainer> <AutoProcIntegrationContainer> <Image>
hierarchy.
If there are any missing items, incorrect or inconsistent information in those XML files, please let us know immediately!
autoPROC provides a large amount of output - both in terms of files, but also in terms of annotated results that both a novice and experienced user should find useful. We invest a lot of effort to make the autoPROC output as understandable and educational as possible. It should not only help users to understand the quality of their final dataset better, but also to get helpful information and suggestions to maybe improve future experiments. Furthermore, it can sometimes provide indications of local issues with setup or instruments that beamline staff can use.
Therefore, we would like to have as much of autoPROC's "added value" visible to the user as technically possible. For that reason we provide a whole range of result files that should be made available to the user - namely:
If there are technical reasons why some of those results can't be presented: please let us know and we would try and work towards a solution. If the above files are stored and made available to users (including the visualisation of results provided in HTML and PDF files), the normal output from autoPROC could be removed or archived. Especially the HTML file will contain a full record of the run conditions and should allow users to potentially process the data again (e.g. at home institution) using the same commands and software versions.
Finally, please make sure that the use of autoPROC is referenced adequately and that users are aware of the use of autoPROC for particular results they might have achieved automatically from synchrotron systems.
By default, standard output from process will contain escape sequences to have bold or underlined text. If storing standard output is required (and this would be a good idea), then setting the environmental variable autoPROC_HIGHLIGHT to "no" will prevent this.
Some sites/beamlines might require specific settings (regarding header information, local coordinate convention, goniostat configurations etc). Since the 20210224 release, the meta-data is analysed to extract a detector identifier (if available) together with a date: these are then used to look up site-specific settings in a distributed database $autoPROC_home//autoPROC/lib/detector-site.def to get most likely correct parameters. These would typically include changes to the rotation axis direction, beam centre convention, vertical/horizontal rotation axes etc. If this is not intended, please add "do_setup=no" to your process command-line.
Furthermore, some pre-defined settings might already be available within our distribution: please see
% process -M list
output for such "macros". You could write your own macro (see "process -M show" for examples) or check our database of known settings here. If you know of wrong or missing settings in those tables or have any other information regarding specific beamlies: please let us know.
Although the default parameters for running autoPROC are the result of processing a very large number of datasets over the years, some settings that would speed-up a job could be used - with the caveat that this means running autoPROC in non-default mode.
These settings could include
XdsSpotSearchNumImagesAngularRange="10.0" XdsSpotSearchNumRanges=4 XdsSpotSearchAngularRange=180
autoPROC_CreateGpxPicturesAtRotationAngles="0" autoPROC_CreateGpxPicturesAtStages="process"
Remember that data-processing will consist of a large amount of disk I/O, especially reading of image files. If this is accomodated in special ways at the synchrotron site, it should be taken advantage of. Specifically:
process \ FindImages_AllowCompressedImages=yes \ autoPROC_XdsKeyword_LIB=/where/ever/xds-zcbf.so \ ....
Apart fom the specific details regarding distribution (across nodes and threads), some possible usages of autoPROC could be:
# all defaults: % process ... # explicitly assume anomalous signal: % process -ANO ... # explicitly assume no anomalous signal: % process -noANO ... # use XSCALE for scaling (instead of AIMLESS): % process -M ScalingX ... # use CC(1/2) as high-resolution criteria (instead of default I/sigI): % process -M HighResCutOnCChalf ... # in case of "poor" diffraction: % process -M LowResOrTricky ... # going for pure speed (with all the obvious caveats this entails): % process -M fast ... # with known SG: % process symm=P21 ... # with known SG and cell: % process symm=P21 cell="34 45 56 90 98 90" ... # with reference dataset available % process -ref /where/ever/ref.mtz ...
Of course these can be combined. See also your local autoPROC reference card at
$autoPROC_home/docs/autoproc/manual/autoproc_reference_card.pdf
Apart from using autoPROC with (more or less) defaults on each sweep of data, it can easily accommodate a wide array of re-processing options - especially since it was designed for multi-sweep processing right from the start.
One default feature of autoPROC is to automatically combine differnet sweeps into a single, merged dataset if the wavelength value of those sweeps is identical. This is defined via the parameter
WavelengthSignificantDigits=5
If the wavelength value written into the image header can change between sweeps, you might need to reset that criteria to something more appropriate for your given setup (especially what/how wavelength values are written into image headers), e.g.
% process WavelengthSignificantDigits=4 ...
The most common reasons for re-processing might be
This could be due to multi-wavelengths/MAD data, multiple orientations when using a multi-axis goniostat, (pseudo-)helical scans, interleaved data collection, inverse beam etc. If all images reside in the same directory, just using
% process -I /where/ever/images ...
will be fine. If data is in separate directories, one can use
% process -Id "A,/where/ever/scan1,test_####.cbf,1,900" \ -Id "B,/where/ever/scan2,test_####.cbf,1,900" \ ...
To handle multi-orientation data correctly, the instrument/goniostat description needs to be correct and up-to-date - see:
% process -M list
and
% x_kappa -list
for our currently distributed, beamline/instrument specific settings. If you are running a multi-axis instrument at your beamline, please contact us with updates and calibration datasets fro time to time!
Since the orientations of those datasets are unrelated, you need to run with
% process EnsureConsistentIndexing=no ...
This will avoid transforming orientation matrices between different sweeps according to some defined instrument/goniostat model.
This can easily be done using e.g.
% process -Id "A,/where/ever/scan1,test_####.cbf,1,600" \ -Id "B,/where/ever/scan2,test_####.cbf,201,900" \ ...
If you want to exclude images in the middle of a sweep, you could run e.g.
% process -Id "A1,/where/ever/scan1,test_####.cbf,1,200" \ -Id "A2,/where/ever/scan1,test_####.cbf,401,600" \ -Id "B,/where/ever/scan2,test_####.cbf,201,900" \ ...
However, this would treat potentially very small wedges as separate datasets. Another option would be to first create a directory with symbolic links to the images wanted
% mkdir tmpA % ln -s /where/ever/scan1/test_0[01][0-9][0-9].cbf tmpA/. % ln -s /where/ever/scan1/test_0200.cbf tmpA/. % ln -s /where/ever/scan1/test_0[45][0-9][0-9].cbf tmpA/. % rm tmpA/test_0400.cbf % ln -s /where/ever/scan1/test_0600.cbf tmpA/.
and then run
% process -Id "A,`pwd`/tmpA,test_####.cbf,1,600" \ -Id "B,/where/ever/scan2,test_####.cbf,201,900" \ ...
This can be implemented via
% process symm=P21 ...
or
% process symm=P21 cell="40 50 45 90 90.3 90" ...
To avoid classification of very large anomalous/dispersive differences being classified as outliers, one can run with
% process -ANO ExpectLargeHeavyAtomSignal=yes ...
or
% process -ANO ExpectLargeHeavyAtomSignal=yes ExpectLargeHeavyAtomSignalScaleAndMerge=yes ...
A script like
#!/bin/sh module load autoPROC module load global/cluster process -I /where/ever/images \ autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 \ autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 \ -d autoPROC.dir | tee autoPROC.log
can be used to take advantage of the compute cluster (forkintegrate has been configured accordingly, the COLSPOT step is not configured for multi-node execution). Of course, any additional command-line arguments can also be added - e.g.
#!/bin/sh module load autoPROC module load global/cluster process -I /where/ever/images \ symm=P6122 cell="93 93 130 90 90 120" \ autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_PROCESSORS=16 \ autoPROC_XdsKeyword_MAXIMUM_NUMBER_OF_JOBS=4 \ -d autoPROC.dir | tee autoPROC.log
Submission would then be done with
% qsub -pe smp 16 -cwd run.sh
(where run.sh is the above script).
Please note that not all of these pages will be up-to-date: developments at synchrotrons/beamlines usually move much faster than documentation can keep up with. If you notice any out-of-date (or non-existent) documentation during your beamline visit, please let us know - especially if you are in a position to help improving those documents.
It would be great if synchrotron beamlines could check how other synchrotrons/beamlines are writing image file headers and stick as closely as possible to those. At the moment (20210413) we have e.g. a probably unnecessary number of variants for the Detector keyword in mini-cbf headers:
Detector: (null), S/N E-32-0119 Detector: ADSC HF-4M, S/N H401, Detector: D19@ILL curved detector Detector: Dectris Eiger 16M, E-32-0107 Detector: Dectris Eiger 16M, S/N E-32-0100 Detector: Dectris Eiger 16M, S/N E-32-0101 Detector: Dectris Eiger 16M, S/N E-32-0102 Detector: Dectris Eiger 16M, S/N E-32-0104 Detector: Dectris Eiger 16M, S/N E-32-0108 Detector: Dectris Eiger 16M, S/N E-32-0110 Detector: Dectris Eiger 16M, S/N E-32-0113 Detector: Dectris Eiger 16M, S/N E-32-0115 Detector: Dectris Eiger 16M, S/N E-32-0116 Detector: Dectris Eiger 4M, S/N E-08-0104 Detector: Dectris Eiger 9M, S/N E-18-0101 Detector: Dectris Eiger 9M, S/N E-18-0102 Detector: Dectris Eiger 9M, S/N E-18-0103 Detector: Dectris Eiger 9M, S/N E-18-0104 Detector: Dectris Eiger2 9M, S/N E-18-0110 Detector: Eiger 16M, S/N (null) Detector: Eiger 4M, S/N (null) Detector: PILATUS 12M, S/N 120-0100 Detector: PILATUS 2M, S/N 24-0103, Elettra Detector: PILATUS 2M, S/N 24-0107 Diamond Detector: PILATUS 2M, S/N 24-0109 Detector: PILATUS 2M-F, S/N 24-0109-F Detector: PILATUS 2MF, S/N 24-0109-F Detector: PILATUS 6M Detector: PILATUS 6M Prosport+, S/N 60-0100 Diamond Detector: PILATUS 6M, 60-0103, IMCA-CAT Detector: PILATUS 6M, S/N 60-0101 SSRL Detector: PILATUS 6M, S/N 60-0101, Detector: PILATUS 6M, S/N 60-0102, PSI Detector: PILATUS 6M, S/N 60-0104, ESRF ID29 Detector: PILATUS 6M, S/N 60-0104, ESRF ID30B Detector: PILATUS 6M, S/N 60-0106, Soleil Detector: PILATUS 6M, S/N 60-0107, BNL Detector: PILATUS 6M, S/N 60-0108, Alba Detector: PILATUS 6M, S/N 60-0113, Detector: PILATUS 6M, S/N 60-0116-F, ESRF ID23 Detector: PILATUS 6M, S/N 60-0118 Detector: PILATUS 6M, S/N 60-0118, HZB-BESSYII BL14.1 Detector: PILATUS 6M, SN 60-0001, X06SA@SLS.PSI.CH Detector: PILATUS 6M-F, S/N 60-0105-F Detector: PILATUS 6M-F, S/N 60-0112-F Detector: PILATUS 6M-F, S/N 60-0114-F Detector: PILATUS 6M-F, S/N 60-0115-F Detector: PILATUS 6M-F, S/N 60-0117-F Detector: PILATUS 6MF, S/N 60-0102-F, PSI Detector: PILATUS 6MF, SN 60-0001-F, X06SA@SLS.PSI.CH Detector: PILATUS 6MF-0109 Detector: PILATUS3 2M, S/N 24-0118, ESRF ID23 Detector: PILATUS3 2M, S/N 24-0118, ESRF ID30 Detector: PILATUS3 2M, S/N 24-0124 Detector: PILATUS3 300K, S/N 3-0226 Detector: PILATUS3 6M, S/N 60-0119 Detector: PILATUS3 6M, S/N 60-0122 Detector: PILATUS3 6M, S/N 60-0126 Detector: PILATUS3 6M, S/N 60-0128, ESRF ID29 Detector: PILATUS3 6M, S/N 60-0128, ESRF ID30 Detector: PILATUS3 6M, S/N 60-0131 Detector: PILATUS3 6M, S/N 60-0132 Detector: PILATUS3 6M, S/N 60-0134 Detector: PILATUS3 6M, S/N 60-0135 Detector: PILATUS3 6M, S/N 60-0136
Just because the specification does allow for any kind of string here, it would be helpful to stick with some sensible, common syntax ;-)