autoPROC Documentation	previous	next
Usage

autoPROC Documentation : Usage

Copyright © 2004-2018 by Global Phasing Limited

All rights reserved.

This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.

Documentation (2004-2018) Clemens Vonrhein, Claus Flensburg, Wlodek Paciorek & Gérard Bricogne

Contact proc-develop@GlobalPhasing.com

Introduction
process
- Command line
Recommended minimum user input
- Beam centre
- Beam stop
Additional parameters
aP_scale (scaling module)
Other tools and programs

Introduction

Here you will find information to serve as a reference to the various commands that a user might run as part of data processing with autoPROC. More detailed examples and tutorials are also available on the autoPROC wiki. You should also print out a copy of the autoPROC reference card for typical usage of different commands.

process

The "process" command is used to run the complete autoPROC pipeline. There is some simple on-line usage information available if run with

process -h

showing the main command-line arguments:

Flag Arguments Remark

-checkdeps check for all external dependencies (and stop)

-check check for all external dependencies (and stop if an error is encountered)

-M <macro|file> Use a pre-defined combination of parameter settings via a macro; for a list of pre-defined macros use -M list. Alternatively, a file name with own settings (similar to a macro) can be given. See also here for details about the macro feature.

-I <image directory> Directory with data images (default: current directory).

-d <dir> Output will be written to [sub]directory instead of current directory.

-R <reslow> <reshigh> Low- and high-resolution limits (default is to use limits given by detector dimensions). Scaling/merging is allowed to limit these further based on additional criteria.

-[no]ANO Do (or do not) treat Bijvoet pairs as separate during processing and scaling (as appropriate within each program step). The output will always contain analysis of anomalous signal as well as the separate I+/I- and F+/F- data items (plus DANO/SIGDANO for anomalous differences). The default is to not treat Bijvoet pairs as different - but if you have high enough multiplicity and expect a large anomalous signal, you might want to change that via the -ANO switch.

-B if running in batch mode (no highlighting on standard output)

-ref <MTZ file> Reference MTZ file for spacegroup, cell, indexing and (optional) test-set flags. The output from autoPROC will be in the same setting as this MTZ file.

-free <MTZ file> Reference MTZ file (for test-set only) - if the spacegroup allows several indexing solutions at lest one set of amplitude (and sigmas) need to be present in this file. The output from autoPROC will contain this (optionally reindexed) test-set.

-Id <idN>,<dirN>,<templateN>,<fromN>,<toN> To override automatic definition of identifiers/scans. Each identifier requires 5 (comma-separated) items:

<idN> identifier string (no special characters, since directories/files might get created using this string)

<dirN> directory containing images

<templateN> filename template for images (using a series of '#'s as placeholder for image number); give full name of *_master.h5 file if HDF5/Eiger data is to be used

<fromN> starting image number

<toN> final image number

-h5 <master.h5> Eiger/HDF5 dataset (master file); this flag can be given several times if multiple datasets should be processed by autoPROC

-nthreads <no. of threads> how many threads should be used during parallelized steps. A negative value means: use (all)/n. (default=0 ie. do not change: from program defaults)

<par>="<val>" generic system to allow setting of parameter <par> to value <val>

Command line

The recommended way to run autoPROC is the command

process -I /where/ever/images -d out > out.log

This will define where the images are, where output files (subdirectory "out") and message (to file "out.log") should go. If the information in the header of the images is correct (which it should be) - this is all that's needed. You can also run the command within a directory containing images (in which case the -I flag is not required) or have it create output files and directories within the current directory - it all depends on the directory layout you use and want to enforce for data-processing.

It is always a good idea to let autoPROC write all output into a separate sub-directory:

process -d <dir>

(where <dir> could be e.g. 01 to get sequential numbering if several runs are going to be done). We would also recommend saving standard output via

process -d <dir> > dir.log

Recommended minimum user input

Some information is either not contained in the image headers (like beamstop shadow), could be inaccurate or is sometimes ill-defined (beam centre, rotation direction). To get the best data possible and to avoid early failures (e.g. during indexing), it might be necessary for the user to provide this information to the program.

Beam centre

One of the main reasons for failed indexing or processing of data is a wrong or ill-defined beam centre. Often, the header values are actually correct (or very close) but it is not clear to which coordinate system they refer. There are in principle 8 possibilities to define a (X,Y) coordinate system for a 2D array: (x,y), (-x,y), (x,-y), (-x,-y), (y,x), (-y,x), (y,-x) and (-y,-x).

Usually, the image header follows a convention that will work for a particular data processing package. As far as we know, the relation between the known data processing packages is mostly

Package Unit Convention

MOSFLM mm (x,y)

XDS pixel depends on definition of coordinate system - but often (y,x)

d*TREK pixel depends on definition of coordinate system - but often (y,x)

Package	Unit	Convention
MOSFLM	mm	(x,y)
XDS	pixel	depends on definition of coordinate system - but often (y,x)
d*TREK	pixel	depends on definition of coordinate system - but often (y,x)

The coordinates of the direct beam (in pixel units for XDS) can be given directly through the beam parameter, e.g. using
```
process beam="1532 1582"
```
Please note that there is a difference between the value XDS expects (detector origin) and the more easily accessible direct beam coordinates:
- the direct beam is the point on the detector where the direct beam hits it (either in mm or pixels)
  This can be easily visualised through a direct beam shot or ice/powder/wax rings (MOSFLM/iMOSFLM has a nice tool to fit a circle to user-selcted points, giving the centre of this circle).
- the detector origin is the point on the detector that is closest to the origin of the coordinate system, i.e. the intersection of X-ray beam with crystal.
  If the detector is perfectly aligned (i.e. the detector normal is identical with the direct beam direction), then there is no difference in the two values. However, in reality the detector is often slightly misaligned. This results in a difference between those two points.
If only the direct beam coordinates are known (from ice/powder/wax rings or a direct beam shot), then those need to be given with the beam parameter - even if autoPROC will use those as values for the detector origin parameters in XDS. If the correct detector origin values for XDS are known (e.g. from a previous sample on the same beamline, at the same or very similar distance and (ideally) during the same shift), then those should be given with the beam parameter.
If the header values for the direct beam coordinates are correct but require a transformation, you could also specify only the transformation using e.g.
```
process BeamCentreFrom="header:y,x"
```
This is by far the most recommended way, since it only needs to be established once for a given beamline/setup. Often, some test datasets are collected on a given beamline/instrument to calibrate e.g. the beam centre values versus the detector distance. These datasets could be used to establish a possible transformation required to put the image header values (extracted with the imginfo program) into the program convention.
See also the autoPROC wiki for details regarding specific beamlines and synchrotrons.
If the header values are correct but the exact transformation isn't known, one could test all 8 possibilities for the most likely one using
```
process BeamCentreFrom="getbeam:init"
```
This approach works surprisingly often, but is only really a stop-gap solution. Why should one re-establish the transformation between image header and program convention for every single scan/dataset over and over again - since we know it should be constant over a long time and a lot of datasets for any given beamline/setup? However, one could use this approach to get the correct transformation value and use this with a BeamCentreFrom=header:?,? parameter from then onwards.
Or use the final reported value of direct beam coordinates and the beam8.sh jiffy in conjunction with the image header information (running imginfo) to establish the beam-centre convention used at that beamline (and ideally report this to us if it isn't already present on the autoPROC wiki).
If the header values are likely to be wrong and therefore the direct beam centre needs to be determined afresh, one can use
```
process BeamCentreFrom="getbeam:refined"
```
This is a fairly desperate approach, since autoPROC will try and optimise the beam centre based on some radial features present in the image background. This can be problematic for weak images, detectors with a large number of "gaps" or scaling problems between CCD modules etc.

Beam stop

The reason for defining the bad area of the detector accurately becomes clear when the list of rejected reflections (e.g. in AIMLESS's so-called ROGUES file) is being inspected: unless highly redundant data has been collected, the wrong reflections might get rejected (i.e. keeping the weak reflections that were behind the beam-stop shadow and rejecting the strong ones as outliers). This will either damage the anomalous signal (in case of experimental phasing) or influence the bulk-solvent scaling and overall connectivity.

It is always a good idea to spend a little bit of time defining the beam-stop shadow area as well as possible. Especially if several datasets with the same setup (distance etc) have been collected, the same set of parameters could be used for all runs. There are tools in all programs to help you.

XDS has some basic tools for defining the area of the detector to be ignored. Apart from the resolution limits these are

TRUSTED_REGION: this is usually only used for image plate data where the images themselves are square but the active area is circular.
UNTRUSTED_RECTANGLE: This allows the definition of rectangular regions within the image. Changes to this parameter require are-running the INIT step.
UNTRUSTED_QUADRILATERAL: This allows the definition of a general quadrilateral (changes to this parameter require are-running the INIT step).
UNTRUSTED_ELLIPSE: allows definition of circular or elliptical regions.
VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS: the two values are used as cut-offs in the so-called DEFPIX step. The second value needs hardly be changed. Often, the default for the first value is too low and needs to be increased (sometimes up to about 9000). After each change of that value, the DEFPIX step needs to be re-run - this is done automatically within autoPROC.
Remember that any XDS parameter can be given to autoPROC via the autoPROC_XdsKeyword_* mechanism.

To tell XDS within autoPROC about the beamstop, one could create a text file with a set of keywords, e.g.

autoPROC_XdsKeyword_UNTRUSTED_RECTANGLE="570 1469 1920 2048|590 1369 1820 2038"
autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1510 1560 1490 1590"

and run with

process -M <text file> -d <subdir>

To check the effect of various settings, the BKGPIX.cbf file should be inspected - e.g. with ADXV or GPX2.

Note: in general it is also possible to mask the beamstop through a low-resolution cut-off. This is adequate if the direct beam hits a circular beamstop mid-centre. However, in cases of beamstops with other shapes or if the direct beam hits the beamstop off-centre one would need to specify a more generous low-resolution cut-off to mask out the full beamstop area (which could have negative effects on the data quality if some strong and well-defined reflexions are therefore missed).

Additional parameters

There are several mechanisms available for the user to fine-tune settings for autoPROC. It is important to make yourself familiar with these mechanisms and the syntax.

Syntax:

Apart from command-line flags of the form "-<flag> [<value>]", parameters can be set using the "<par>=<val>" syntax. If a parameter should contain a (space-separated) list of values, these need to be quoted, e.g.

beam="1524 1532"

Remember that command-line arguments are processed in the order they are given - so later arguments can override earlier settings!

Mechanism:

There are two main mechanisms available to pass a parameter setting to autoPROC

command-line:
any command/tool will recognize a <par>=<val> string on the command-line, e.g.
```
process beam="1524 1532"
```
parameter/macro file:
The main autoPROC command process has a "-M" flag that takes as argument the name of a file. This file would then contain a set of <par>=<value> lines (one per line) and can be used with
```
process -M par.dat
```

Each mechanism can be used in different circumstances:

If a series of changes are required for a particular set of images, a file with settings could be created and used according to mechanism 2.
If it is only very few parameters that needs to be changed, the command-line might be the easiest option, e.g. using
```
process -d 01 beam="1524 1532" symm="P21212" cell="78 67 89 90 90 90"
```

A full list of parameters is available in Appendix 1.

aP_scale (scaling module)

This is a stand-alone scaling module using the program AIMLESS or XSCALE.

Flag Arguments Remark

-mtz <MTZ file>[:P,X,D] multi-record MTZ file. Additionally, the explicit Pname, Xname and Dname can be given (otherwise: defaults taken as-is from header). If several MTZ files are to be used, the combine_files tool can be used to generate a single multi-record MTZ file from multiple input files (with the adequate batch-number offset required).

-hkl <XDS file>[:P,X,D] XDS_ASCII.HKL formatted file to run XSCALE-path here. Several files can be given (files with identical wavelength will be combined within XSCALE).

-id <identifier> identifier for this run (used for prefixing output files and picking up starting values from previous run); default = "".

-ANO|-noANO switch on/off special treatment of anomalous differences; default = off.

-scale "<scale layout>" scaling layout (see AIMLESS documentation).
default = "ROTATION SPACING 5.0 ABSORPTION 6 BFACTOR ON"
If data is below 5.0 A, we use as default = "ROTATION SPACING 5.0 ABSORPTION 6 BFACTOR OFF"

-symm <spacegroup> space-group symbol; default is to take from reflection file header

-nres <Nres/asu> no. of amino-acid residues per asymmetric unit to put data on roughly absolute scale; default = 0

-R <reslow> <reshigh> resolution limits; default = "1000.0 0.1", ie all data

-nthreads <no. of threads> how many threads should be used during parallelized parts. A negative value means: use (all)/n. (default 0 ie. do not change from program defaults)

-freemtz <MTZ-file> MTZ file with existing test-set flag (to use same test-set reflections in final output MTZ files); default = ""

-M <macro> use a pre-defined combination of parameter settings; for a list of available macros use "-M list"

-noice do not treat resolution ranges with possible ice-rings special; default is to treat those differently so that the high-resolution cut-off criteria doesn't get confused.

-P <P_N> <X_N> <D_N> triggers creation of a new dataset N description; project, crystal and dataset name must be given.
Note: without giving this argument, the remaining arguments below will have no effect!

-b <B1_N>-<B2_N>[,<BN>] for current dataset N, the batch range B1-B2 will be used (and split into separate runs using B batches for each run). If the split parameter B is given negative, then an appropriate run size will be calculated to give B separate runs. The default is to not split the current dataset into separate runs.

-x <E1_N>[,<E2_N>...<Ei_N>] (comma-separated) list of batches to exclude from current dataset N (default is not to exclude any). This requires at least one -P argument to be set before.

<par>="<val>" generic system to allow setting of parameter <par> to value <val>

Flag	Arguments	Remark
-mtz	<MTZ file>[:P,X,D]	multi-record MTZ file. Additionally, the explicit Pname, Xname and Dname can be given (otherwise: defaults taken as-is from header). If several MTZ files are to be used, the combine_files tool can be used to generate a single multi-record MTZ file from multiple input files (with the adequate batch-number offset required).
-hkl	<XDS file>[:P,X,D]	XDS_ASCII.HKL formatted file to run XSCALE-path here. Several files can be given (files with identical wavelength will be combined within XSCALE).
-id	<identifier>	identifier for this run (used for prefixing output files and picking up starting values from previous run); default = "".
-ANO\|-noANO		switch on/off special treatment of anomalous differences; default = off.
-scale	"<scale layout>"	scaling layout (see AIMLESS documentation). default = "ROTATION SPACING 5.0 ABSORPTION 6 BFACTOR ON" If data is below 5.0 A, we use as default = "ROTATION SPACING 5.0 ABSORPTION 6 BFACTOR OFF"
-symm	<spacegroup>	space-group symbol; default is to take from reflection file header
-nres	<Nres/asu>	no. of amino-acid residues per asymmetric unit to put data on roughly absolute scale; default = 0
-R	<reslow> <reshigh>	resolution limits; default = "1000.0 0.1", ie all data
-nthreads	<no. of threads>	how many threads should be used during parallelized parts. A negative value means: use (all)/n. (default 0 ie. do not change from program defaults)
-freemtz	<MTZ-file>	MTZ file with existing test-set flag (to use same test-set reflections in final output MTZ files); default = ""
-M	<macro>	use a pre-defined combination of parameter settings; for a list of available macros use "-M list"
-noice		do not treat resolution ranges with possible ice-rings special; default is to treat those differently so that the high-resolution cut-off criteria doesn't get confused.
-P	<P_N> <X_N> <D_N>	triggers creation of a new dataset N description; project, crystal and dataset name must be given. Note: without giving this argument, the remaining arguments below will have no effect!
-b	<B1_N>-<B2_N>[,<BN>]	for current dataset N, the batch range B1-B2 will be used (and split into separate runs using B batches for each run). If the split parameter B is given negative, then an appropriate run size will be calculated to give B separate runs. The default is to not split the current dataset into separate runs.
-x	<E1_N>[,<E2_N>...<Ei_N>]	(comma-separated) list of batches to exclude from current dataset N (default is not to exclude any). This requires at least one -P argument to be set before.
<par>="<val>"		generic system to allow setting of parameter <par> to value <val>

More information about the scaling options available to users is available here.

For usage examples please see autoPROC wiki.

Other tools and programs

These commands are used within autoPROC but could still be useful to run directly for particular steps. All programs will have a help message if run with the "-h" argument.

aP_check

This tool will check the user environment and autoPROC installation. If you suspect an installation problem this would be the first command to run.

aP_convert_hdf5

This tool will take an Eiger HDF5 file and extract all images as mini-CBF files with a complete and well-formatted mini-CBF header (i.e. looking very much like standard Pilatus/mini-cbf images).

add_freerflag.sh

To transfer the test-set flags from one MTZ file to another - also ensuring that the correct cell parameters are used and the test-set flags are extended/complemented/restricted to the resolution limit of the actual data file.

beam8.sh

If the beam centre value in the header is correct but the convention is unknown, this tool can help determining the relation between the header values and the rectangular image array. It requires input values that one can find by running imginfo on one image.

check_indexing

Although it's main purpose is (as the name says) to check for possible alternative indexing schemes of two or more datasets, when run with the "-v" argument it will also give detailed information about R-value and correlations (including on anomalous differences) between the datasets and the (first) reference one.

cmpmat

If several indexing solutions are possible, this program will compare the two orientation matrices (in XPARM.XDS format) and report the smallest rotation angle between them (taking space group symmetry into account). The usage is

cmpmat <XPARM.XDS> <XPARM.XDS> <space group name>

combine_files

This will combine several reflection files (containing unmerged data) into a single file ready for subsequent scaling with aP_scale. It will check for consistent indexing in case alternative indexing schemes are allowed.

Flag Arguments Remark

-f <file> unmerged reflection file

-P <pname> <xname> <dname> project, crystal and dataset name applying to the previously defined reflection file (-f flag); several -f/-P combos can be given.

-o <output MTZ> output MTZ file name

-ref <reference MTZ> reference MTZ file (to ensure consistent indexing in case alternative indexing schemes are allowed by the space group)

Flag	Arguments	Remark
-f	<file>	unmerged reflection file
-P	<pname> <xname> <dname>	project, crystal and dataset name applying to the previously defined reflection file (`-f` flag); several `-f`/`-P` combos can be given.
-o	<output MTZ>	output MTZ file name
-ref	<reference MTZ>	reference MTZ file (to ensure consistent indexing in case alternative indexing schemes are allowed by the space group)

find_images

This is the tool autoPROC uses to find data sets if only a directory containing images is given.

Flag Arguments Remark

-r run recursively (default = no)

- return list suitable for automatic processing in autoPROC

-d <dir> search in directory <dir> (default = current directory)

-s <min> <max> minimum and maximum size for image files to consider (default = 512k and 98304k)

Flag	Arguments	Remark
-r		run recursively (default = no)
-		return list suitable for automatic processing in autoPROC
-d	<dir>	search in directory <dir> (default = current directory)
-s	<min> <max>	minimum and maximum size for image files to consider (default = 512k and 98304k)

hdf2mini-cbf

Actual binary for extraction of image data from Eiger/HDF5 files - with full control what image ranges to extract and how to run this task in parallel (ie. via multiple threads). The resulting mini-cbf files should contain a fully populated mini-cbf header as long as the HDF5 file is well designed.

imgdate.sh

When the records of the exact data collection strategy employed are lost, this tool will output a sorted (by image timestamp) list of images if run with the "-s flag. It relies on timestamps being present in the image header and being accurate.

imginfo

To extract information in a consistent manner from different image file formats, this command will read as much as possible from the image header. It is a good idea to test the image header format and content whenever a user encounters a beamline/instrument for the first time or if anything changed on a beamline.

imginfo some.cbf

For various beamline/detector/date combinations we do provide so-called 'override' functions to handle non-standard, incomplete or plainly wrong image header content. However, our goal (dream?) is that one day all image headers will be complete, unambiguous and contain correct values ...

mrfana

Computing statistics on unmerged reflection data in different formats (unmerged MTZ files, INTEGRATE.HKL, XDS_ASCII.HKL or XSCALE) is done with this program as part of the scaling module in autoPROC. For a full list of command-line options, please run

mrfana -h

The most useful command-line options are

Flag Arguments Remark

-n <nshell> number of bins (equal volume); a negative value will do the 'standard' binning on 1/d**2

-nref <nrefbin> number of measured reflections per bin (default = 1000); a negative value will trigger binning by equal numbers

-r <reslow> <reshigh> resolution limits

Flag	Arguments	Remark
-n	<nshell>	number of bins (equal volume); a negative value will do the 'standard' binning on 1/d**2
-nref	<nrefbin>	number of measured reflections per bin (default = 1000); a negative value will trigger binning by equal numbers
-r	<reslow> <reshigh>	resolution limits

Last modification: 27.03.2018

Copyright	© 2004-2018 by Global Phasing Limited

	All rights reserved.

	This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.

Documentation	(2004-2018) Clemens Vonrhein, Claus Flensburg, Wlodek Paciorek & Gérard Bricogne

Contact	proc-develop@GlobalPhasing.com

autoPROC Documentation : Usage

Contents

aP_scale (scaling module)