Printer Friendly - DepositionMmCif

Content

Introduction

Relevant files from refinement

From BUSTER
From other refinement packages

Additional files from data processing

From autoPROC
From STARANISO
Combining deposition-ready files from autoPROC (or STARANISO) with data from various refinement programs using aB_deposition_combine

General notes

Notes regarding wwPDB deposition

Additional examples, FAQs and use cases

Introduction

The information here (from October, 11th, 2022) tries to always be up-to-date - but be prepared for changes at external sites or software systems beyond our control.

Our refinement program BUSTER provides mmCIF files ready for deposition to the wwPDB since the 20190214 release: these are automatically generated at the end of each refinement. In order to be aware of any recent developments within this area, please:

check the current release notes,

ensure your BUSTER installation is up-to-date,

see if the issues page has release-specific information,

be careful about using conversion or harvesting tools (like PDB_EXTRACT or SF_CONVERT) as part of the deposition process. They are e.g. not always aware of autoPROC as a processing package - or of the type of analysis STARANISO would perform - and are not necessarily up-to-date with the correct source (log files etc) for a particular mmCIF item. All this can easily result in confusion and incorrect data making it into the archived file(s).

Relevant files from refinement

A refinement program should always provide two PDBx/mmCIF files ready for deposition: one for the model and one for the reflection data. In general, these would just be a different format/representation of the output PDB (model) and MTZ (reflection) files that a user might be more familiar with. If there are no deposition-ready PDBx/mmCIF files generated automatically, various conversion tools are available to generate those - but it is always better to use the PDBx/mmCIF files generated natively by the refinement program itself!

From BUSTER

The two relevant files for deposition of BUSTER refinement results are

BUSTER_model.cif: This file contains the atomic model as well as any non-standard restraint information (e.g. for a unique ligand or compound).

It also contains the refinement statistics (R values, number of reflections used etc).
It does not contain the data processing statistics ("Table 1" containing e.g. completeness, redundancy, CC1/2, Rpim etc): this needs to come from the data processing stage and extra care is needed to use the correct set of data quality metrics clearly associated with the data used during refinement!

BUSTER_refln.cif: This file can contain several so-called data blocks. The first one contains all the standard reflection data (observed amplitudes and their sigmas ¹ , model structure factors, figure-of-merit, Hendrickson-Lattmann coefficients, test-set flags - see also here) as well as map coefficients (mFo-DFc difference map and 2mFo-DFc map) for all observed reflections.

To accomodate different types of map-coefficients (isotropically filled-in 2mFo-DFc and/or anisotropically filled-in 2mFo-DFc coefficients as defined by e.g. STARANISO), additional data loops could be present with relevant information stored in each "_diffrn.details" tag to describe a data block.

From other refinement packages

Please consult the relevant information of the refinement program used to find out how to generate those two deposition-ready PDBx/mmCIF files directly.

Additional files from data processing

See also here for additional background information.

The reflection file (in deposition-ready PDBx/mmCIF format) from refinement often doesn't contain the full reflection data available from the original data processing - but only a subset relevant to the final stages of refinement. Therefore, we would like to combine that reflection data subset with the richer reflection data from the data processing step leading to the input reflection data used during refinement.

From autoPROC

autoPROC provides its own deposition-ready PDBx/mmCIF files for reflection data - including all the processing (scaling and merging) data quality metrics.

These need to be combined with the model and reflection data files coming from BUSTER in order to have only two mmCIF files (one model and one reflection) as expected by the deposition system.

In most cases the relevant file(s) from data processing will be called

Data_1_autoPROC_STARANISO_all.cif (for anisotropic/STARANISO analysis - corresponds to staraniso_alldata-unique.mtz), or
Data_2_autoPROC_TRUNCATE_all.cif (for traditional/isotropic analysis - corresponds to truncate-unique.mtz)

When processing multi-sweep and/or multi-wavelength data there could be fairly self-explanatory insertions into the file naming to distinguish datasets.

A user needs to pick the relevant one, depending on what (MTZ) file from data processing was picked for further structure solution and refinement.

The aB_deposition_combine tool should be used to (1) detect/check for the correct processing result and (2) combine those with the two files from BUSTER - after taking potential re-indexing and SG differences into account. It will also ensure that the correct set of data quality metrics from data processing are transferred into the model mmCIF file (where they are kept for historical reasons within archived PDB entries and expected by the deposition system).

See aB_deposition_combine -h for further information about the usage.

Please ensure the use of autoPROC and/or STARANISO is correctly referenced.

If the data quality metrics should be provided manually to the deposition system, the relevant scaling/merging statistics for STARANISO data (staraniso_alldata-unique.mtz) are in staraniso_alldata-unique.table1. Please do not use PDB_EXTRACT with a randomly picked log file from that processing directory: it will extract incorrect values!

For the traditional (isotropic) data (truncate-unique.mtz) these are in truncate-unique.table1. Again, do not use PDB_EXTRACT with a randomly picked log file from that processing directory: it will extract incorrect values!

From STARANISO

See here for details.

Combining deposition-ready files from autoPROC (or STARANISO) with data from various refinement programs using aB_deposition_combine

Let's discuss what the aB_deposition_combine tool tries to achieve: for each reflection data block within the multi-block mmCIF files produced by autoPROC (which it tries to find within the directory given by the -aP flag), it will try and match up reflections with those reported by BUSTER (in BUSTER_refln.cif). The assumption is that the autoPROC result files in the given output directory were used without any major modification as input to BUSTER - e.g. the MTZ file staraniso_alldata-unique.mtz.

However, that might not have been the case here: if you e.g. used intermediate data (unscaled or even scaled intensities) with some other scaling program or a different procedure to go from intensities to amplitudes, the amplitudes in BUSTER_refln.cif (describing the data as input into refinement) will be different from those in e.g. Data_1_autoPROC_STARANISO_all.cif.

You might then see in the log file something like

NOTE : found 32/100 in
       ./process_03//Data_1_autoPROC_STARANISO_all.cif
       (1_staraniso)

NOTE : too few matches found

telling you that there are some matching amplitude/sigma pairs found - but not quite enough (only 32 out of 100). Maybe some other changes (re-indexing/scaling? SG assignment changed?) occured between the autoPROC job and the final BUSTER refinement?

You can change some of the decision making e.g. by running

  aB_deposition_combine \
    autoBUSTER_DepositionCombine_FindProcessingCif_RandomHit=0.2 \
    ...

to allow for further analysis even if only 20% of the initial comparisons (between 100 random reflections) are successful. Or increase the autoBUSTER_DepositionCombine_FindProcessingCif_RandomFuz parameter (default = 0.02) to allow for more difference between amplitudes. However, you should also double check if ./process_03/ is the autoPROC result directory containing the actually used reflection data that went into BUSTER.

Another potential problem can be so-called daisy-chaining of reflection data - e.g. taking the staraniso_alldata-unique.mtz from autoPROC into MR, then the output MTZ file from that MR step into refinement program A and the resulting output MTZ file into refinement with program B. That is always a recipe for confusion with potential data modification or rescaling happening.

In our hands, if e.g. the staraniso_alldata-unique.mtz file was taken as-is for refinement you should then see 100/100 reflections matching and the tool creating the final combined versions without any problem.

General notes

Make sure to deposit the "all" versions of Rmerge, Rmeas (=Rrim) and Rpim and not the "within" ones! Do not deposit R-values from XDS/XSCALE if run with the FRIEDEL'S_LAW=FALSE flag set: these correspond to the "within" version and don't describe the merged intensity (IMEAN) values itself.

Remember that completeness is a measure of the number of actually observed reflections relative to the number of reflections that are expected to be theoretically observable.

For anisotropic (STARANISO) data, only the reflections within the ellipsoid fitted to the cut-off surface (as determined by STARANISO) would ever be observable. So as a measure of the quality of the experiment (how well was it designed to record all observable reflections), the "Completeness (ellipsoidal)" is the correct value to look at.

The "Completeness (spherical)" assumes that all reflections within a sphere (or spherical shell) could be observed. This is true for isotropic data or for the (lower resolution) enclosed sphere of anisotropic data - but falls short for anisotropic data. However, taken together with the "Completeness (ellipsoidal)" it gives a good idea about the extent of anisotropy the data shows.

It should be noted that these two definitions are actually very similar - just that the cut-off surface is defined differently:
The isotropic analysis assumes isotropy and therefore enforces a spherical cut-off surface.
The anisotropic analysis has no assumption and therefore doesn't enforce a particular cshape to the cut-off surface - and contrary to popular belief is does not assume anisotropy or enforces anisotropy!

Be careful when using other automatic data extraction tools, since they might use a different file/source for those metrics: always check the deposited values with the values clearly presented by the actually used data processing software (e.g. autoPROC output like summary.html or the PDF reports, the STARANISO server results page etc).

See also this Gemmi tool for help in preparing reflection data mmCIF files containing both merged and unmerged data blocks - in case you didn't use autoPROC (where those data blocks are already contained in the mmCIF files).

This is ongoing work together with the wwPDB PDBx/mmCIF Working Group - for more details see:

See more general deposition FAQ page

*1: these are unmodified from the input data!