autoPROC Documentation previous next
Usage

autoPROC Documentation : Usage

Copyright    © 2004-2015 by Global Phasing Limited
 
  All rights reserved.
 
  This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.
Documentation    (2004-2015)  Clemens Vonrhein, Claus Flensburg, Wlodek Paciorek & Gérard Bricogne
 
Contact proc-develop@GlobalPhasing.com


Contents


Introduction

Here you will find information to serve as a reference to the various commands that a user might run as part of data processing with autoPROC. More detailed examples and tutorials are also available on the autoPROC wiki. You should also print out a copy of the autoPROC reference card for typical usage of different commands.

process

The "process" command is used to run the complete autoPROC pipeline. There is some simple on-line usage information available if run with
process -h
showing the main command-line arguments:

FlagArgumentsRemark
-checkdeps check for all external dependencies (and stop)
-check check for all external dependencies (and stop if an error is encountered)
-M <macro|file> use a pre-defined combination of parameter settings via a macro; for a list of pre-defined macros use -M list. Alternatively, a file name with own settings (similar to a macro) can be given. See also here for details about the macro feature.
-I <image directory> directory with data images (default: current directory)
-d <dir> output will be written to [sub]directory instead of current directory
-R <reslow> <reshigh> low- and high-resolution limits (default is to use limits given by detector dimensions). Scaling/merging is allowed to limit these further based on additional criteria.
-[no]ANO process for anomalous differences (default); use -noANO to switch this off (ie. assume Friedel's law holds true).
-B if running in batch mode (no highlighting on standard output)
-ref <MTZ file> reference MTZ file for spacegroup, cell, indexing and (optional) test-set. The output from autoPROC will be in the same setting as this MTZ file.
-free <MTZ file> reference MTZ file (for test-set only) - if the spacegroup allows several indexing solutions at lest one set of amplitude (and sigmas) need to be present in this file. The output from autoPROC will contain this (optionally reindexed) test-set.
-Id <idN>,<dirN>,<templateN>,<fromN>,<toN> to override automatic definition of identifiers/scans. Each identifier requires 5 (comma-separated) items:
<idN> identifier string (no special characters, since directories/files might get created using this string)
<dirN> directory containing images
<templateN> filename template for images (using a series of '#'s as placeholder for image number)
<fromN> starting image number
<toN> final image number
-h5 <master.h5> Eiger/HDF5 dataset (master file); this flag can be given several times if multiple datasets should be processed by autoPROC
-nthreads <no. of threads> how many threads should be used during parallelized steps. A negative value means: use (all)/n. (default=0 ie. do not change: from program defaults)
-xml write XML file(s) containing data for deposition
<par>="<val>" generic system to allow setting of parameter <par> to value <val>

Command line

The recommended way to run autoPROC is the command
process -I /where/ever/images -d sub.dir > out.put
This will define where the images are, where output files and message should go and - if the information in the header of the images is correct (which it should be) - this is all that's needed. You can also run the command within a directory containing images (in which case the -I flag is not required) or have it create output files and directories within the current directory - it all depends on the directory layout you use and want to enforce for data-processing.

It is always a good idea to let autoPROC write all output into a separate sub-directory:

process -d <subdir>
(where <subdir> could be e.g. 01 to get sequential numbering if several runs are going to be done).


Recommended minimum user input

Some information is either not contained in the image headers (beamstop shadow) or could be inaccurate or ill-defined (beam centre, rotation direction). To get the best data possible and to avoid early failures (e.g. during indexing), it might be necessary for the user to provide this information to the program.

Beam centre

The main reason for failed indexing or processing of data is a wrong or ill-defined beam centre. Often, the header values are actually correct (or very close) but it is not clear to which coordinate system they refer. There are in principle 8 possibilities to define a (X,Y) coordinate system for a 2D array: (x,y), (-x,y), (x,-y), (-x,-y), (y,x), (-y,x), (y,-x) and (-y,-x).

Usually, the image header follows a convention that will work for a particular data processing package. As far as we know, the relation between the known data processing packages is mostly

PackageUnitConvention
MOSFLM mm (x,y)
XDS pixel depends on definition of coordinate system - but often (y,x)
d*TREK pixel depends on definition of coordinate system - but often (y,x)

Beam stop

The reason for defining the bad area of the detector accurately becomes clear when the list of rejected reflections (e.g. in AIMLESS's so-called ROGUES file) is being inspected: unless highly redundant data has been collected, the wrong reflections might get rejected (i.e. keeping the weak reflections that were behind the beam-stop shadow and rejecting the strong ones as outliers). This will either damage the anomalous signal (in case of experimental phasing) or influence the bulk-solvent scaling and overall connectivity.

It is always a good idea to spend a little bit of time defining the beam-stop shadow area as well as possible. Especially if several datasets with the same setup (distance etc) have been collected, the same set of parameters could be used for all runs. There are tools in all programs to help you.

Note: in general it is also possible to mask the beamstop through a low-resolution cut-off. This is adequate if the direct beam hits a circular beamstop mid-centre. However, in cases of beamstops with other shapes or if the direct beam hits the beamstop off-centre one would need to specify a more generous low-resolution cut-off to mask out the full beamstop area (which could have negative effects on the data quality if some strong and well-defined reflexions are therefore missed).


Additional parameters

There are several mechanisms available for the user to fine-tune settings for autoPROC. It is important to make yourself familiar with these mechanisms and the syntax.

Syntax:

Apart from command-line flags of the form "-<flag> [<value>]", parameters can be set using the "<par>=<val>" syntax. If a parameter should contain a (space-separated) list of values, these need to be quoted, e.g.

beam="1524 1532"

Remember that command-line arguments are processed in the order they are given - so later arguments can override earlier settings!

Mechanism:

There are two main mechanisms available to pass a parameter setting to autoPROC

  1. command-line:

    any command/tool will recognize a <par>=<val> string on the command-line, e.g.

    process beam="1524 1532"
    
    Setting parameters this way will always override any other setting (using mechanisms described below).
  2. parameter/macro file:

    The main autoPROC command process has a "-M" flag that takes as argument the name of a file. This file would then contain a set of <par>=<value> lines (one per line) and can be used with

    process -M par.dat
    
Each mechanism can be used in different circumstances:

A full list of parameters is available in Appendix 1.


aP_scale (scaling module)

A stand-alone scaling module using the program AIMLESS or XSCALE.

Flag Arguments Remark
-mtz <MTZ file>[:P,X,D] multi-record MTZ file. Additionally, the explicit Pname, Xname and Dname can be given (otherwise: defaults taken as-is from header). If several MTZ files are to be used, the combine_files tool can be used to generate a single multi-record MTZ file from multiple input files (with the adequate batch-number offset required).
-id <identifier> identifier for this run (used for prefixing output files); default = 1
-ANO|-noANO switch on (default) or off special treatment of anomalous differences
-scale "<scale layout>" scaling layout (see AIMLESS documentation).

default = "ROTATION SPACING 5.0 ABSORPTION 6 BFACTOR ON"

If data is below 5.0 A, we use as default = "ROTATION SPACING 5.0 ABSORPTION 6 BFACTOR OFF"

-symm <spacegroup> space-group symbol; default is to take from reflection file header
-nres <Nres/asu> no. of amino-acid residues per asymmetric unit to put data on roughly absolute scale; default = 0
-R <reslow> <reshigh> resolution limits; default = "1000.0 0.1", ie all data
-nthreads <no. of threads> how many threads should be used during parallelized parts. A negative value means: use (all)/n. (default 0 ie. do not change from program defaults)
-freemtz <MTZ-file> MTZ file with existing test-set flag (to use same test-set reflections in final output MTZ files); default = ""
-M <macro> use a pre-defined combination of parameter settings; for a list of available macros use "-M list"
-P <P_N> <X_N> <D_N> triggers creation of a new dataset N description; project, crystal and dataset name must be given.

Note: without giving this argument, the remaining arguments below will have no effect!

-b <B1_N>-<B2_N>[,<BN>] for current dataset N, the batch range B1-B2 will be used (and split into separate runs using B batches for each run). If the split parameter B is given negative, then an appropriate run size will be calculated to give B separate runs. The default is to not split the current dataset into separate runs.
<par>="<val>" generic system to allow setting of parameter <par> to value <val>

More information about the scaling options available to users is available here.

For usage examples please see autoPROC wiki.


Other tools and programs

These commands are used within autoPROC but could still be useful to run directly for particular steps. All programs will have a help message if run with the "-h" argument.

aP_check

This tool will check the user environment and autoPROC installation. If you suspect an installation problem this would be the first command to run.

beam8.sh

If the beam centre value in the header is correct but the convention is unknown, this tool can help determining the relation between the header values and the rectangular image array. It requires input values that one can find by running imginfo on one image.

check_indexing

Although it's main purpose is (as the name says) to check for possible alternative indexing schemes of two or more datasets, when run with the "-v" argument it will also give detailed information about R-value and correlations (including on anomalous differences) between the datasets and the (first) reference one.

cmpmat

If several indexing solutions are possible, this program will compare the two orientation matrices (in XPARM.XDS format) and report the smallest rotation angle between them (taking space group symmetry into account). The usage is
cmpmat <XPARM.XDS> <XPARM.XDS> <space group name>

combine_files

This will combine several reflection files (containing unmerged data) into a single file ready for subsequent scaling with aP_scale. It will check for consistent indexing in case alternative indexing schemes are allowed.

FlagArgumentsRemark
-f <file> unmerged reflection file
-P <pname> <xname> <dname> project, crystal and dataset name applying to the previously defined reflection file (-f flag); several -f/-P combos can be given.
-o <output MTZ> output MTZ file name
-ref <reference MTZ> reference MTZ file (to ensure consistent indexing in case alternative indexing schemes are allowed by the space group)

find_images

This is the tool autoPROC uses to find data sets if only a directory containing images is given.

FlagArgumentsRemark
-r run recursively (default = no)
- return list suitable for automatic processing in autoPROC
-d <dir> search in directory <dir> (default = current directory)
-s <min> <max> minimum and maximum size for image files to consider (default = 512k and 98304k)

imgdate.sh

When the records of the exact data collection strategy employed are lost, this tool will output a sorted (by image timestamp) list of images if run with the "-s flag. It relies on timestamps being present in the image header and being accurate.

imginfo

To extract information in a consistent manner from different image file formats, this command will read as much as possible from the image header. It is a good idea to test the image header format and content whenever a user encounters a beamline/instrument for the first time or if anything changed on a beamline.
imginfo some.cbf
For various beamline/detector/date combinations we do provide so-called 'override' functions to handle non-standard, incomplete or plainly wrong image header content. However, our goal (dream?) is that one day all image headers will be complete, unambiguous and contain correct values ...

mrfana

Computing statistics on unmerged reflection data in different formats (unmerged MTZ files, INTEGRATE.HKL, XDS_ASCII.HKL or XSCALE) is done with this program as part of the scaling module in autoPROC. For a full list of command-line options, please run
mrfana -h
The most useful command-line options are

FlagArgumentsRemark
-n <nshell> number of bins (equal volume); a negative value will do the 'standard' binning on 1/d**2
-nref <nrefbin> number of measured reflections per bin (default = 1000)
-r <reslow> <reshigh> resolution limits


Last modification: 02.03.2016