Update:
This is principally a comparison of XDS versions "June 30, 2023" versus "June 30, 2024" done during late July to mid September 2024 (with about 4 weeks' interruption in between due to vacations).
Please note that we tried to stick to a "lowest common denominator" in terms of automation and expertise in this comparison: this is not intended to compare different packages that internally run XDS, so a lot of the advanced features (e.g. in our autoPROC package) are ignored. This means that one can reach better results when taking advantage of those features or maybe after manually fine-tuning data processing on a case-by-case basis as an expert. We wanted to keep it simple and show what happens when a user runs software in all-default mode (as this is also likely to happen on automatic data processing results triggered directly at a synchrotron beamline).
Starting from the PDB release as of 20240919, we can get various details about a particular PDB entry as present in the archive:
After sorting those by data collection date - and restricting ourselves to Pilatus/Eiger datasets only - we can pick the latest PDB entry for a given detector/beamline/synchrotron combo. This results in 176 examples:
4FQN 4MTK 4QKI 4TVT 4TWN 4XGU 5AUI 5BXG 5CYZ 5E9I 5EO9 5FBO 5FD7 5IUK 5K7M 5LP9 5LXW 5NHU 5NKT 5OD9 5ONZ 5RG0 5SP3 5SP6 5VZR 6BGA 6BLH 6BLI 6CK7 6CW0 6DEX 6E80 6EXI 6IU9 6JGJ 6MOE 6NCR 6NKQ 6NQY 6NW4 6O2H 6O6N 6OWX 6OWZ 6P8P 6P8S 6P8U 6R16 6SCX 6TPI 6UCA 6VZQ 6VZW 6W4B 6WAY 6X3O 6YJP 6Z5F 6Z9G 6ZJ6 7ALO 7AO5 7AR4 7AV6 7BBP 7BWH 7CLJ 7DF2 7DK1 7KCN 7KDS 7KMJ 7MJB 7N2P 7OTH 7OVP 7PD5 7PHO 7POX 7PWM 7PWP 7PWU 7PWW 7QGF 7QOQ 7QT8 7R0K 7R3W 7R59 7S87 7SGW 7SY9 7TAL 7TBO 7TCD 7THB 7TM9 7TPB 7UDI 7UV5 7WDA 7WEZ 7XLI 7XRC 7YZX 7Z1V 7Z1Y 7Z41 7Z7L 7Z8Y 7Z8Z 8A19 8A9D 8AGQ 8AQ8 8B1N 8BFY 8BXT 8CX4 8DA3 8DU7 8E5V 8E60 8EGN 8EPS 8EW7 8F8E 8FFE 8FG7 8FRF 8FT8 8GCA 8GI4 8GM6 8HKR 8JYJ 8K4Q 8OIC 8PIE 8PQA 8PQC 8PQD 8Q90 8QK1 8R2G 8R5Q 8R5R 8RCA 8RCB 8RCC 8S1R 8SDW 8SHR 8SIO 8SLU 8SO5 8SXS 8TCA 8TYZ 8U09 8U0I 8U1E 8UFN 8UFO 8UM6 8V2T 8VEV 8VM2 8W6K 8WT3 8YEQ 9B7F 9CPL 9CRW 9D8S 9EWK
For 89 of those we already had the data locally on disk and had run them at some point (over the last 10+ years) as part of our testing:
4FQN 4QKI 5AUI 5E9I 5FBO 5LP9 5NHU 5OD9 5ONZ 5RG0 5SP3 5SP6 5VZR 6BLI 6CK7 6CW0 6DEX 6NQY 6P8P 6P8U 6R16 6SCX 6TPI 6UCA 6VZQ 6VZW 6ZJ6 7AO5 7AR4 7AV6 7BWH 7DK1 7KDS 7KMJ 7MJB 7PWP 7PWW 7QGF 7R0K 7R59 7S87 7SY9 7TBO 7THB 7TM9 7UV5 7WDA 7XRC 7YZX 7Z1V 7Z1Y 7Z41 7Z8Y 8AGQ 8AQ8 8B1N 8BXT 8CX4 8DA3 8E5V 8E60 8EGN 8EPS 8EW7 8F8E 8FG7 8FT8 8GCA 8K4Q 8PQC 8PQD 8Q90 8R5Q 8R5R 8RCB 8RCC 8SDW 8SHR 8SIO 8SLU 8SO5 8SXS 8TCA 8UFN 8UFO 8VEV 8WT3 9CPL 9D8S
Taking the first 60 of those (ordered by fastest runtime - to allow for as many tests as possible) gives us a final list of
4FQN 4QKI 5AUI 5E9I 5FBO 5OD9 5ONZ 5RG0 5SP6 5VZR 6BLI 6CK7 6CW0 6DEX 6NQY 6P8P 6P8U 6R16 6TPI 6UCA 6VZQ 6VZW 7AO5 7DK1 7KDS 7MJB 7S87 7SY9 7TM9 7UV5 7WDA 7Z1V 7Z1Y 7Z41 8AGQ 8B1N 8BXT 8DA3 8E5V 8E60 8EGN 8EPS 8EW7 8FG7 8FT8 8GCA 8K4Q 8PQC 8R5Q 8RCC 8SDW 8SHR 8SIO 8SLU 8SO5 8TCA 8UFN 8VEV 8WT3 9CPL
This list contains entries originally (according to PDB entry) processed by XDS (32), Aimless (22), XSCALE (14), autoPROC (7), HKL-3000 (7), MOSFLM (5), DIALS (4), xia2 (3), SCALA (3), HKL-2000 (3), iMOSFLM (1), STARANISO (1) and SCALEPACK (1).
# set specific "xds_par" binary on command-line: process -nthreads NTHREADS xds=/where/ever/binary symm=SPGR cell="a b c al be ga" -I /where/ever/images
# specific "xds_par" binary is expected first in $PATH: fast_dp -j 1 -k NTHREADS -J 1 -s "SPGR" -c a,b,c,al,be,ga -l /where/ever/plugin /where/ever/image-file # NOTE: we are unable to make the -l switch work here (it # doesn't seem to have any effect and we don't get a LIB= # line in XDS.INP ... which means that our attempts with HDF5 # datasets fail while the *.cbf.gz datasets process through # the slower "temporary-decompressed-file" path)
generate_XDS.INP /where/ever/image-file # add/replace SPACE_GROUP_NUMBER= and UNIT_CELL_CONSTANTS= # adjust MAXIMUM_NUMBER_OF_JOBS= and MAXIMUM_NUMBER_OF_PROCESSORS= # run specific binary: xds_par # if indexing fails, remove the "0 0 0" spots from SPOT.XDS and re-run again
We are also looking at using XDSME (latest change in Jan 2023), Xia2 and any other available command-line tools for data processing using XDS as the central indexing/integration engine.
As a simple check on some basic data quality metrics, we are using MRFANA from autoPROC on the final scaled and unmerged data provided by each package/run. This isotropic analysis uses the criterion of CC(1/2) above 0.3 in the outer shell (using equal-number binning):
mrfana -cutoff_rpim 99.9 -cutoff_isigi 0.0 -cutoff_cchalf 0.3 -nref -1000
Recording for each of the 60 examples if a job (see above) went through to the end, we test this via:
program | success 20230630 | success 20240723 | success 20241002 |
autoPROC | 100% | 97% | 97% |
plain_XDS | 95% | 97% | 97% |
fast_dp | 87% | 87% | NA |
Please note that the non-autoPROC systems might be run differently when an expert configures them for a specific instrument/beamline ... so your mileage might vary.
Let's take two metrics as simple proxies for data quality (after determining the high-resolution limit based on outer-shell CC(1/2) > 0.3):
Please note that this completely ignores any anisotropy in the data, so is definitely sub-optimal. But we wanted to stick with the lowest common denominator between the various runs and packages.
We can then do a comparison for those two re-processing jobs between resolution (if there is a change of at least 10% in reciprocal-space volume, that change is considered significant) and <I/sigI> (at least 10% difference in value seems significant) for the various XDS versions:
20230630 versus 20240723:
program | higher resolution 20230630 | higher resolution 20240723 | higher <I/sigI> 20230630 | higher <I/sigI> 20240723 |
autoPROC | 15% | 15% | 23% | 10% |
plain_XDS | 43% | 18% | 55% | 12% |
fast_dp | 28% | 32% | 38% | 15% |
20230630 versus 20241002:
program | higher resolution 20230630 | higher resolution 20241002 | higher <I/sigI> 20230630 | higher <I/sigI> 20241002 |
autoPROC | 0% | 3% | 3% | 43% |
plain_XDS | 2% | 12% | 7% | 7% |
fast_dp | NA | NA | NA | NA |
Notes:
The most interesting examples are those where one of the two investigated XDS versions gives significantly different statistics (resolution and/or <I/sigI>) in any of the packages tested. We only look at those examples where the compared packages ran successfully with both XDS versions.
Feeding the final scaled+unmerged results of each package into POINTLESS via one of the following commands
pointless xdsin XDS_ASCII.HKL # autoPROC, plain XDS, fast_dp pointless xdsin aimless_alldata_unmerged.mtz # autoPROC pointless xdsin fast_dp_unmerged.mtz # fast_dp
and looking for a message in the standard output of POINTLESS
WARNING: the L-test suggests that the data may be twinned
gives us:
program | reflection data (scaling) | warnings 20230630 | warnings 20240723 | warnings 20241002 |
autoPROC | XDS_ASCII.HKL (CORRECT) | 2% | 35% | 17% |
aimless_alldata_unmerged.mtz (AIMLESS) | 2% | 32% | 12% | |
plain_XDS | XDS_ASCII.HKL (CORRECT) | 2% | 55% | 17% |
fast_dp | XDS_ASCII.HKL (CORRECT) | 2% | 52% | NA |
fast_dp_unmerged.mtz (CORRECT) | 2% | 52% | NA |
Notes:
Please note that other programs (e.g. our own STARANISO) might come to slightly different conclusions about twinning. Again: we tried to keep it simple and to stick to widely used systems.
See also the discussion in e.g. Parkhurst et al, 2016.
For consistent planes, the HKL plots are best suited since they only depend on the crystal symmetry. The PQR plots (planes defined by the principal axes of the fitted ellipsoid) will be affected by the underlying distribution of signal in reciprocal space and by the resulting cut-off surface (to which an ellipsoid is then fitted).
For autoPROC these are created automatically (after AIMLESS scaling of INTEGRATE.HKL), while for other packages we use the final, scaled+merged reflection data (using XDSCONV on the XDS_ASCII.HKL file to create an MTZ file if required).
Notes:
We can stay even closer to the raw, integrated intensities coming out of INTEGRATE by looking at the scaling and outlier analysis of the resulting INTEGRATE.HKL file (this is where the "new method for estimating the background in each image pixel" will have the biggest impact).
CORRECT will apply several correction factors ("MODULATION" across the detector surface, "DECAY" as a function of image number and resolution, and "ABSORPTION" as a function of image number and detector region). After these correction factors are applied, measurements are rejected if they deviate too much from their symmetry-equivalent reflections or if they don't follow expected Wilson distribution.
We can visualise those correction factors and outliers easily for the same set of examples (the latter is done automatically by autoPROC already):
Misfits
PDB | autoPROC | plain_XDS | fast_dp |
5SP6 | |||
6UCA | |||
7KDS | |||
8EGN | |||
8GCA | NA | ||
8FG7 | |||
8K4Q | NA | ||
8TCA | NA | ||
8UFN | |||
8VEV | NA | ||
8WT3 |
Assuming that nothing has changed in the way the outlier rejection is performed in CORRECT after applying those correction factors, there seem to be some kind of knock-on effects from differences in integration results with the 20240723 version. The pattern of outlier rejections in CORRECT was already rather puzzling in older versions (but without any impact on autoPROC, since it is using the raw integrated intensities from INTEGRATE.HKL directly in the scaling module aP_scale - partly for that reason), but that version gives rise to even more puzzling shapes.
The 20241002 version seems to have reverted back mostly to the old behaviour at this point.
Looking at the timings as reported by COLSPOT and INTEGRATE ("cpu time (sec)" in *.LP files) for all the above 60 examples (where the same set of images was finally used):
XDS v1 | XDS v2 | COLSPOT v1/v2 | INTEGRATE v1/v2 |
20240723 | 20230630 | 0.84 | 0.89 |
20241002 | 20230630 | 0.80 | 1.84 |
20241002 | 20240723 | 0.97 | 2.17 |
Notes:
This currently only compares the old 2023 version to the 2024 version(s) prior to the latest 20241002 release!
Some users were able to provide us with specific examples of recent data collections and processing results - thanks a lot for that!
Data was collected in multiple orientations using the Global Phasing workflow (and sometimes at multiple wavelengths). Here "OLD" refers to the 20230630 version and "NEW" to the 20240723 one.
Crystal | Wavelength | HKL | PQR |
01 | 1 | ||
02 | 1 | ||
03 | 1 | ||
04 | 1 | ||
05 | 1 | ||
06 | 1 | ||
2 | |||
07 | 1 | ||
08 | 1 | ||
09 | 1 | ||
10 | 1 | ||
11 | 1 | ||
12 | 1 | ||
13 | 1 | ||
14 | 1 | ||
15 | 1 | ||
16 | 1 | ||
17 | 1 | ||
18 | 1 | ||
19 | 1 | ||
20 | 1 | ||
21 | 1 | ||
22 | 1 | ||
23 | 1 | ||
24 | 1 | ||
25 | 1 | ||
2 | |||
3 | |||
27 | 1 | ||
2 | |||
3 | |||
28 | 3 | ||
29 | 1 | ||
30 | 1 |
A single-sweep datasets processed with an older (20220220) XDS version and the first new binary (20240712):