autoPROC Documentation previous next
Interpreting autoPROC output

autoPROC Documentation : Interpreting autoPROC output

Copyright    © 2015 by Global Phasing Limited
 
  All rights reserved.
 
  This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.
Documentation    (2015)  Clemens Vonrhein, Claus Flensburg, Wlodek Paciorek & Gérard Bricogne
 
Contact proc-develop@GlobalPhasing.com


Contents



Introduction

A typical sequence of data processing will consist of The output of autoPROC is organised as a series of files and directories, including plots (in PNG format - see here for how to view them) and scripts that can be run by the user interactively.



Step-by-step

Since all important information is currently given on standard output (stdout), it is a very good idea to always save this into a file. This can be done e.g. via
process ... -d 01 > 01.log 2>&1     # bash/zsh/ksh/sh
 - or -
process ... -d 01 >& 01.log         # csh/tcsh
 . or -
process ... -d 01 | tee 01.log      # any shell (but note the
                                      limitation on exit status
                                      when using the tee command)
At the top of stdout will always be information about the machine, account and date autoPROC was run (this can contain very long lines):
step0

Preparation

After checking for a syntactically correct command, the command-line will be reported and additional information given:
step0
autoPROC will summarise what it understood from command-line arguments and parameter settings, e.g. when giving a reference reflection (MTZ) file:
step0
The program will test version information for external programs (like XDS, POINTLESS or AIMLESS).
step0
Please check the messages about external program versions given by autoPROC carefully - if in doubt contact proc-develop@GlobalPhasing.com. You should also check your autoPROC installation using
process -checkdeps
which should ideally report something like
step0
Any warning or error message from the above step should be carefully analysed.

BeamCentreFrom setting

You will often get a warning about the beam centre coordinates:
step0
Because the wrong specification of the direct beam coordinate is the main reason for a failed indexing, this is stressed explicitly in that message. You might want to also check the autoPROC wiki for additional background information and a table of relevant values for different beamlines and instruments.

Sweep/dataset definition

The last report in this section is a listing of identifiers and sweeps found:
step0
Make sure this is showing the (sweep) identifiers and number of images as expected. There can be unexpected breaks if one or a few images are missing. There could also be wrong concatenation of actually distinct sweeps if the image files are named very similarly. In both these cases it is advisable to use the -Id argument directly, e.g.
process -Id "test,/where/ever/images,test_####.cbf,1,900" ...
instead of relying on autoPROC (via the find_images tool) to find sweeps or data automatically and correctly.

Spot search and indexing

By default, autoPROC will search for spots on all images available and gives those to indexing. autoPROC will try to optimise the initial indexing solution for several reasons:
  1. getting the best starting point for subsequent integration
  2. detecting potential additional indexing solutions (split or twinned crystals, cell increase due to radiation damage etc)
  3. obtaining a list of spots that can't be indexed at all and could therefore be due to ice- or other powder rings, detector or software errors (hot pixels, poor beamstop etc) or any other possible problem during the experiment

Iterative indexing

After this iterative indexing procedure, autoPROC will produce a summary
step1
and a plot (here 03/61817_1_E2/run_idxref_spot_hkl_hist.png) showing the number of spots for each indexing solution as a function of image number:
step1
The (minimum) rotation between the different solutions is also reported:
step1
Proper interpretation of these results depends a lot on the actual sample (keep a note and some screenshots of the crystal as it was mounted during the experiment, ideally in several orientations), previous experiments of the same crystal form (do you always get multiple indexing solutions with the same relative angle between them) and a very careful examination of the diffraction images together with predictions.

Visualisation with GPX2

Running for the above case the script for displaying predictions of several indexing solutions
03/61817_1_E2/status/04_run_idxref/gpx.sh -lat 1,2,3
gives
step1
and then shows predictions in GPX2:
step1
GPX2 showing three prediction sets
step1
close-up of diffraction image with prediction sets 1
 
 
step1
close-up of diffraction image
step1
close-up of diffraction image with prediction sets 2
 
 
step1
close-up of diffraction image with three prediction sets
step1
close-up of diffraction image with prediction sets 3
Please note that this is using a default value for mosaicity which is most likely not the correct one (and there could even be a different mosaicity for each indexing solution, ie. lattice/crystal). In order to re-run the same iterative indexing at the end of processing (of a particular sweep), you could set the parameter autoPROC_ReRunIdxrefAtEnd=yes: this will use the final set of parameters (for the indexing solution used in integration).

By default, autoPROC will pick the indexing solution that uses the most spots. If you want to make it pick another solution, set the parameter XdsOptimizeIdxrefPickSolution to the desired solution (as a two-digit number with leading zero) as given on stdout, e.g. XdsOptimizeIdxrefPickSolution=02 would pick solution number 02.

Unindexed spots (SPOT_never-indexed.noHKL.png)

A very useful plot shows the position (on the detector) of all spots that could not be indexed by any of the trial solutions: SPOT_never-indexed.noHKL.png. It can show ice-rings, detector calibration problems, missed lattices and more:
step1
SPOT.noHKL.png
weak ice-rings
step1
SPOT_never-indexed.noHKL.png
weak ice-rings
step1
SPOT_never-indexed.noHKL.png
weak ice-rings
 
 
step1
SPOT_never-indexed.noHKL.png
strong ice-rings
step1
SPOT_never-indexed.noHKL.png
strong ice-rings
step1
SPOT_never-indexed.noHKL.png
strong ice-rings
 
 
step1
SPOT_never-indexed.noHKL.png
detector calibration
step1
SPOT_never-indexed.noHKL.png
detector calibration
step1
SPOT_never-indexed.noHKL.png
streaky/split spots
 
 
step1
SPOT_never-indexed.noHKL.png
missed lattice(s), partly due to poor starting values describing hardware geometry
step1
SPOT_never-indexed.noHKL.png
missed lattice(s), partly due to poor starting values describing hardware geometry
step1
SPOT_never-indexed.noHKL.png
good - clean

Initial integration and space group determination

Image scale factor (scale.png)

A very useful plot is the per-image scale factor as a function of image number (scale.png). Ideally we want to see a smoothly varying curve that might show some periodicity (for 180- and 360-degree total rotation range):
step2
360 degree rotation
step2
180 degree rotation
step2
120 degree rotation
Sometimes there are single images that show an outlier value for image scale:
step2
last image - shutter synchronisation
problem?
step2
first image - shutter synchronisation
problem?
step2
intermediate image - what could be
the problem?
If there are patterns visible that can't be explained by the experiment (e.g. if some kind of inverleaving was done), this might point back to beamline instrumentation issues (goniostat instabilities, reproducibility or energy changes etc) or synchrotron specifics (top-up modes, beam stability etc). In any case, such patterns should be explainable (check with beamline staff and make them aware of those plots) and ideally be avoided for future experiments.
step2
0.25 degree per image
step2
0.4 degree per image
step2
0.25 degree per image

Mosaicity and beam divergence (divergence-mosaicity.png)

During integration, XDS will by default determine crystal mosaicity and beam divergence directly from the diffraction images. The divergence-mosaicity.png plot combines curves for beam divergence and mosaicity. However, this can sometimes lead to poorer estimates at the beginning of the dataset - which is why autoPROC will set those parameters in subsequent integration steps (see also here):
step2
estimated (per image) and used (for central region) values of
mosaicity and divergence (as determined by XDS automatically)
step1
final Rmerge value showing poorer results for initial block of images
 
 
step2
estimated (per image) and used (for central region) values of
mosaicity and divergence (when re-using overall values in
subsequent integration steps); this is the default in autoPROC
step1
final Rmerge value showing better results also for initial block of images

Space group assignment

Unless the space group for the crystal is already known (but even then: surprises happen), the space group determination with POINTLESS should be checked carefully. Ideally, all symmetry elements should have a similar good score:
step1
The reflections allowing for an unambiguous determination of screw axes might not always be measured; this depends on the crystal morphology, the way they are mounted and what possibilities for re-orientation of the crystal are provided by the beamline/instrument. If the space group was given on the command line via the symm parameter, or if a reference MTZ file was given with the -ref flag, the user-provided space group will be compared to the POINTLESS analysis:
step1
Sometimes, the space group assignment is unclear, a typical situation being pseudo-orthorhombic unit cell dimensions and non-crystallographic symmetries:
step1
Here one 2-fold axis is significantly better than the other two, showing the true space group as being rather monoclinic than orthorhombic. If the user provided the space group (explicitly or through a reference MTZ file), autoPROC will report any mismatch:
step1
In this case the only way to find the "correct" space group and cell is by solving the structure and reaching a full model with good geometry and low R-values.

Statistics

At the end of the initial integration and space group analysis/assignment, some statistics (as function of resolution) are given:
step2
It might already be obvious, that the crystal didn't diffract to the corner (highest resolution) or even edge (where completeness is still maximum) of the detector:
step2
This is also visible in the diffraction picture itself:
step2
top left (corner)
step2
top left (edge)
step2
top right (edge)
step2
top right (corner)
In such a case using an explicit, initial high-resolution limit on the command line
process -R 100.0 1.35 ...
would speed up data processing and ensure that not too many noisy reflection data enter the final scaling/merging stage (which can sometimes get stuck in a local minimum if the input data is too noisy). But it is always best to choose a crystal-detector distance that will use as much of the detector surface as possible for all sweeps and orientations to be collected.

Anomalous signal

Furthermore, very strong anomalous signal is detected:
step2

Integration

Overloaded reflections

Now that an initial set of parameters, a unit cell and space group is available, a further round of integration and post-refinements is done. This will give a similar table of statistics against resolution, but also analyse for overloaded reflections:
step3
Of course there should be as few as possible overloaded reflections: these usually occur for the strongest, low-resolution reflections which are the most important when it comes to structure solution via molecular replacement, heavy-atom substructure solution for experimental phasing, density-modification to improve phases and bulk solvent modelling in refinement.

A short summary about cell parameters, distance, detector origin and wavelength is also given:
step3

Introduction to plots

There are several plots created that can be very useful in interpreting data integration but also analyse the instrumentation side of the experiment. Most of those are given as a function of image number - but be careful: a consecutive image numbering might not correspond to a consecutive collection of those images. If some kind of interleaving was done (inverse-beam or interleaved wavelength) there was additional exposure and data collection at specific wedge sizes. Also, if the crystal was translated during collection - either smoothly in a helical scan or step-wise to new positions on the crystal - this needs to be taken into account when interpreting those plots. here we will give some typical examples, but different experimental designs might result in different plots.

Changes to orientation matrix (angle_cell_axis_ABC.png and angle_cell_axis0_ABC.png)

The first set of plots is based on the orientation matrices determined by XDS for each block of images:
step3
angle_cell_axis_ABC.png
ideal: very small change in orientation between blocks of images
step3
angle_cell_axis0_ABC.png
ideal: small and smoothly changing relative to first block of images
 
 
step3
angle_cell_axis_ABC.png
interleaved energies (60 image wedge-size): problems getting back to
previous position after each wedge?
step3
angle_cell_axis0_ABC.png
interleaved energies (60 image wedge-size): also an underlying trend
(relative to first orientation) - maybe different amount of increase
in cell axes due to radiation damage?

Change in cell dimensions/volume (cell_axes_devmean.png)

Another useful plot is the change of cell dimensions (axes) from their respective mean as a function of image number. We expect a smooth, hardly varying set of values. If radiation damage occurred and this manifests itself as an increase in cell dimensions (see Ravelli et al, 2002), a steady increase in those changes could be observed:
step3
cell_axes_devmean.png
P2: good - hardly any change in cell axes
step3
cell_axes_devmean.png
P6122: increase especially in c-axis
step3
cell_axes_devmean.png
C2221: increase in all axes

Crystal to detector distance (distance.png)

The above plots do rely on an accurate determination of the cell dimensions - which are highly correlated to the crystal-detector distance. Any instability or drift in the distance parameter will result in a drift of cell dimensions and could therefore lead to a misinterpretation. autoPROC analyses the distance refinement very carefully and will automatically switch off refinement of this parameter if it shows instability during parameter refinement:
step3
distance.png
120 degree of data: crystal centering slightly off?
step3
distance.png
180 degree of inverse-beam data: instability of parameter refinement

Direct beam position and detector origin (detector_center_origin.png)

The direct beam position and the detector origin (both expressed as a position on the image) are usually quite stable during refinement. But one can still extract information from those plots:
step3
detector_center_origin.png
good - very stable over whole image range
step3
detector_center_origin.png
good - very stable (but values for detector origin and direct beam quite apart: incident beam and detector plane normal not parallel (due to detector alignment)
step3
detector_center_origin.png
good - very stable (and values for detector origin and direct beam very close: very accurate alignment of detector plane to incident beam)
 
 
step3
detector_center_origin.png
drift in direct beam position - is this "real" (due to real drift of beam at beamline)?
step3
detector_center_origin.png
direct beam position quite stable, but detector origin refinement unstable
step3
detector_center_origin.png
drift in direct beam position and very unstable detector origin refinement

Standard deviation on spot position and spindle value (standard_deviation.png)

All of the above parameter refinements will add up to values of standard deviation for spot position and spindle value. This can be taken as a kind of "summary" to point to problematic image ranges or patterns. But the exact interpretation of the underlying reasons still requires careful inspection of the different plots described above ... and ultimately the diffraction images themselves including the predicted spots and spot shapes.
step3
standard_deviation.png
poorer region at beginning (around image 30) than around image 120
step3
run_idxref_spot_hkl_hist.png
multiple lattices: 3 distinct indexing solutions visible around image 120
step3
GPX2
central region of image 30
step3
GPX2
central region of image 120
step3
standard_deviation.png
poor towards the end
step3
cell_axes_devmean.png
increase in cell volume
step3
ana_aimless_Rmerge.png
increase in Rmerge

Scaling and merging

At this point, autoPROC will usually scale each sweep of data internally, merge symmetry-related reflections and create merged reflection files in MTZ and Scalepack format (see also parameter autoPROC_ScaleEachDatasetSeparately). For multi-sweep datasets those results might not be used in the end, but if not all sweeps are of equal quality (especially in the case of radiation damage) this might allow checking and working with each separate sweep.

First autoPROC reports about file conversion, (potential) re-indexing requirements and gives a pointer to the actual command run for the scaling module:
step4
If a reference reflection MTZ file was given, POINTLESS is used to check for possible re-indexing requirements. This will ensure that the output reflection data from autoPROC is consistent with the reference given by the user - which will simplify usage of autoPROC results in pipelines like Pipedream.
step4

Overall statistics for AIMLESS scaling

If scaling was done with AIMLESS, the program version used, a table containing various statistics for overall, inner and outer resolution shell and some analysis about anisotropy of diffraction are given:
step4

High-resolution criteria

autoPROC will automatically decide on an appropriate high-resolution limit based on various criteria (I/sigI, CC(1/2), Rpim, Rmerge, Completeness). Because any change of the high-resolution limit might result in a different behaviour of the internal scaling step (a different set of reflections is used by the refinement procedure), an iterative procedure is applied: The current default values for the various criteria are

CriterionParameterDefaultRemark

CC(1/2) ScaleAnaCChalfCut_123 "-1.0:-1.0 0.0:0.0 0.1:0.1 0.3:0.3" final value is 0.3
Completeness ScaleAnaCompletenessCut_123 "0.0:0.0" inactive
I/sigI ScaleAnaISigmaCut_123 "0.1:0.1 0.5:0.5 0.5:1.0 1.0:2.0" final value is 2.0
Rmeas ScaleAnaRmeasallCut_123 "99.9999:99.9999" inactive
Rmerge ScaleAnaRmergeCut_123 "99.9999:99.9999" inactive
Rpim ScaleAnaRpimallCut_123 "99.9999:99.9999 0.9:0.9 0.8:0.8 0.6:0.6" inactive at the first cycle and fairly loose after that (final value 0.6)

Overall statistics for XSCALE scaling

If scaling was done with XSCALE (see autoPROC_ScaleWithXscale), the program version used, a detailed table of statistics (as function of resolution), the I/sigI(asymptotic) (ISa/ISa0) and the Wilson B-factor are given:
step4
Then the same table containing various statistics for overall, inner and outer resolution shell is shown - to give an easy comparison between different scaling approaches:
step4

REMARK 200 (remark200.pdb)

This table is also the basis for writing a REMARK 200 section into file remark200.pdb:
step4

Multi-sweep datasets

Introduction

autoPROC tries to handle multi-sweep datasets fully automatical. For this some assumptions are made:

By default, the find_images utility is used for finding sets of diffraction images and classifying them as distinct, i.e. belonging to different sweeps. Since this is only based on file names, it might not always give the desired result - in which case using the -Id flag might help. If a single sweep consists of image files following different name templates, using symbolic links to work around this might be necessary.

In any case, autoPROC will always report the distinct sweeps at the beginning (and then goes on to process each sweep on its own):
step5

Effect on processing second and following sweeps

The effect of giving autoPROC several sweeps of data becomes apparent for the second and all following sweeps:
step5
autoPROC will As one can see, the first sweep is taking on a special role - acting as a reference for the others. The automatically determined order of sweeps (by default based on timestamps in the image headers, which would hopefully be accurate) might not be the best to use for this approach. In this case, using the -Id flag would enable the user to give a different ordering, with the "best" dataset given first.

Multi-axis goniostats (kapparot)

One important requirement for the correct handling of multi-sweep datasets with differing goniostat settings (that change the effective orientation of the crystal) is an accurate description of the goniostat axes. Only if these are known can autoPROC (via the kapparot program) transform the orientation matrix of the reference sweep to the new goniostat settings - which is needed since XDS has no knowledge about multi-axis goniostats and treats the rotation axis in a very generic way. We provide a pre-defined set of multi-axis site definitions that can be listed via
x_kappa -list
some of which are shown here:
step5
To use one of those site definitions, the parameter KapparotSite needs to be set to the site identifier.

Final scaling and merging of all data

In the final scaling and merging step, the integrated intensities of each sweep are used together. For the XSCALE scaling path this is done by giving separate XDS_ASCII.HKL files to the scaling module (aP_scale)
step5
while for the AIMLESS path autoPROC first needs to combine the individual reflection files using combine_files:
step5
To distinguish the data from different sweeps within that single reflection file (a so-called multi-record MTZ file), different BATCH number ranges are used. Those then also need to be used when running the scaling module (aP_scale) on that single reflection file:
step5
At this point it is the dataset identifier given with the -P flag to aP_scale that determines which sweeps are being merged together at the end of the unified scaling procedure. Remember that this defaults in autoPROC to the wavelength value so that all sweeps collected at the same energy will finally be merged together:
step5
For the AIMLESS path, some statistics are plotted as function of batch/image number. In case of multiple sweeps collected at different energies, separate plots for each wavelength are generated with the wavelength (in Å) as part of the filename. E.g.
ana_aimless_0.91837_Bdecay.png        ana_aimless_0.97922_Bdecay.png        ana_aimless_0.97936_Bdecay.png
ana_aimless_0.91837_Bscale.png        ana_aimless_0.97922_Bscale.png        ana_aimless_0.97936_Bscale.png
ana_aimless_0.91837_Completeness.png  ana_aimless_0.97922_Completeness.png  ana_aimless_0.97936_Completeness.png
ana_aimless_0.91837_Rmerge.png        ana_aimless_0.97922_Rmerge.png        ana_aimless_0.97936_Rmerge.png
ana_aimless_0.91837_Scale.png         ana_aimless_0.97922_Scale.png         ana_aimless_0.97936_Scale.png
If multiple sweeps are merged together (because they were collected at the same wavelength), this will be visible on those plots:
step5
ana_aimless_0.97936_Scale.png
image scale factor
step5
ana_aimless_0.97936_Bscale.png
scaling B-factor
step5
ana_aimless_0.97936_Bdecay.png
decaying B-factor (straight-line fit to scaling B-factor)
 
 
step5
ana_aimless_0.97936_Completeness.png
cumulative completeness (overall and anomalous)
step5
ana_aimless_0.97936_Rmerge.png
Rmerge
 
The creation of such plots for the XSCALE path is planned for one of the next releases.

Plots

autoPROC generates a large number of plots using gnuplot. These are very useful in analysing the final data quality, spotting problems or ways of improving the data processing - even doing a better planing better for subsequent experiments. Therefore it is highly recommended to ensure that a working gnuplot installation is available on the machine running autoPROC.

At the moment no central output document is generated by autoPROC that would help finding and interpreting those plots (we are working on that). We will discuss some of these plots further down in the step-by-step explanation.

How to locate plots

One useful command to find the plots in the order they were generated could be something like
process -I /where/ever/images ... -d 01 ...
ls -ltra `find 01 -name "*.png"`
which will list all PNG files in the order they were created.

How to view plots

There are a variety of tools available for viewing pictures in PNG or JPG format. Some of the more popular options are the commands
display file.png
 - or -
qiv file.png
Obviously, you could also use your file manager to browse to the relevant directory and file: after that a double-click (or whatever the equivalent on your choice of operating system is) should open the image for viewing.

GPX2

Our visualiser for predictions (position and shape) is called GPX2 and is invoked through a series of scripts that will be generated automatically by autoPROC throughout a data-processing. This will be reported on standard output e.g. as
gpx.sh
We use the notation of "status" to mean the visualisation of prediction positions and spot shape. This script can then be run - e.g. with the "-h" command-line argument to get a help message (some values in that help message will be different depending on the actual processing the predictions will be based upon):
gpx.sh -h
Some command-line options are useful to discuss in more detail: If the indexing step encountered the possibility of several lattices or indexing solutions, several of these scripts will be generated: one for each lattice/solution and one that allows visualising all found lattices together.
gpx.sh -h -lat
When running such a gpx.sh script, the GPX2 window will pop up showing the full diffraction image and all prediction sets:
gpx2 window
You can zoom in and out (centred under the current pointer position) with the mouse-wheel and pan using the left mouse button:
gpx2 zoomed
 
 
gpx2 window
The prediction (and spot) sets can be switched on and off by expanding the predictions text to the left:
gpx2 zoomed
  • You can move between diffraction images using the arrow buttons at the top left:
  • the position and intensity value just under the pointer position is displayed in the lower-left corner;
  • Miller indices and resolution of a predicted reflection are shown whenever the pointer moves inside one of the ellipses;
By default, a series of images will be generated at distinct angular positions for each sweep:
gpx-images

Last modification: 15.09.2015