Content


Introduction

The information here (from October, 11th, 2022) tries to always be up-to-date - but be prepared for changes at external sites or software systems beyond our control.

Our refinement program BUSTER provides mmCIF files ready for deposition to the wwPDB since the 20190214 release: these are automatically generated at the end of each refinement. In order to be aware of any recent developments within this area, please:


Relevant files from refinement

A refinement program should always provide two PDBx/mmCIF files ready for deposition: one for the model and one for the reflection data. In general, these would just be a different format/representation of the output PDB (model) and MTZ (reflection) files that a user might be more familiar with. If there are no deposition-ready PDBx/mmCIF files generated automatically, various conversion tools are available to generate those - but it is always better to use the PDBx/mmCIF files generated natively by the refinement program itself!

From BUSTER

The two relevant files for deposition of BUSTER refinement results are

From other refinement packages

Please consult the relevant information of the refinement program used to find out how to generate those two deposition-ready PDBx/mmCIF files directly.


Additional files from data processing

See also here for additional background information.

The reflection file (in deposition-ready PDBx/mmCIF format) from refinement often doesn't contain the full reflection data available from the original data processing - but only a subset relevant to the final stages of refinement. Therefore, we would like to combine that reflection data subset with the richer reflection data from the data processing step leading to the input reflection data used during refinement.

From autoPROC

From STARANISO

See here for details.

Combining deposition-ready files from autoPROC (or STARANISO) with data from various refinement programs using aB_deposition_combine

Let's discuss what the aB_deposition_combine tool tries to achieve: for each reflection data block within the multi-block mmCIF files produced by autoPROC (which it tries to find within the directory given by the -aP flag), it will try and match up reflections with those reported by BUSTER (in BUSTER_refln.cif). The assumption is that the autoPROC result files in the given output directory were used without any major modification as input to BUSTER - e.g. the MTZ file staraniso_alldata-unique.mtz.

However, that might not have been the case here: if you e.g. used intermediate data (unscaled or even scaled intensities) with some other scaling program or a different procedure to go from intensities to amplitudes, the amplitudes in BUSTER_refln.cif (describing the data as input into refinement) will be different from those in e.g. Data_1_autoPROC_STARANISO_all.cif.

You might then see in the log file something like

NOTE : found 32/100 in
       ./process_03//Data_1_autoPROC_STARANISO_all.cif
       (1_staraniso)

NOTE : too few matches found

telling you that there are some matching amplitude/sigma pairs found - but not quite enough (only 32 out of 100). Maybe some other changes (re-indexing/scaling? SG assignment changed?) occured between the autoPROC job and the final BUSTER refinement?

You can change some of the decision making e.g. by running

  aB_deposition_combine \
    autoBUSTER_DepositionCombine_FindProcessingCif_RandomHit=0.2 \
    ...

to allow for further analysis even if only 20% of the initial comparisons (between 100 random reflections) are successful. Or increase the autoBUSTER_DepositionCombine_FindProcessingCif_RandomFuz parameter (default = 0.02) to allow for more difference between amplitudes. However, you should also double check if ./process_03/ is the autoPROC result directory containing the actually used reflection data that went into BUSTER.

Another potential problem can be so-called daisy-chaining of reflection data - e.g. taking the staraniso_alldata-unique.mtz from autoPROC into MR, then the output MTZ file from that MR step into refinement program A and the resulting output MTZ file into refinement with program B. That is always a recipe for confusion with potential data modification or rescaling happening.

In our hands, if e.g. the staraniso_alldata-unique.mtz file was taken as-is for refinement you should then see 100/100 reflections matching and the tool creating the final combined versions without any problem.


General notes


Notes regarding wwPDB deposition/validation

The above information should provide adequate instructions when using both autoPROC+STARANISO and BUSTER for the data processing and refinement stages ... but what if different systems were used? A user might encounter a wwPDB deposition/validation problem similar to this:

So I tried to run a wwPDB validation job by submitting as mmCIF
structure factor file Data_1_autoPROC_STARANISO_all.cif, and as
mmCIF coordinate file the cif file output by phenix.refine. However,
I still get a "Structure factor file is missing freeR set" error.

We have to remember the distinction between the information given in the _refln.pdbx_r_free_flag column of a reflection mmCIF data block and the notion of a "freeR set" mentioned in the error message one gets.

The current situation can quickly give the following understandable impression

Thus it is not quite true that Data_1_autoPROC_STARANISO_all.cif is
"deposition-ready", because it appears to be so only if combined
with BUSTER-derived coordinates.

because (1) the loss of information and provenance when going from data processing into subsequent use of reflection data and (2) the assumptions of the deposition system what a "typical" set of two mmCIF files should look like. What we provide in the autoPROC+BUSTER world is a way of combining the different deposition-ready files in order to create two files (model and reflection) from the following input:

We then need to do the following in order for the validation/deposition system to be happy:

So we would argue that all our mmCIF files are deposition ready after all (they contain the complete and correct information) - just that the validation/deposition system has certain assumptions that are tricky to meet: a reflection file from data processing will never contain a _refln.status flag since this is a derived quantity computed within downstream processes.

One might then rightly ask:

Surely it would be desirable to allow users to deposit
autoPROC-derived data, even if for whatever reason they used for
refinement a package different from BUSTER?

Absolutely - other refinement packages/systems should probably provide a similar tool to our aB_deposition_combine to combine the often much richer reflection mmCIF from data processing with the more limited reflection mmCIF from refinement ... at least as long as the deposition system itself lacks the flexibility to allow for the same combination steps and checks outlined above.

Remember that one can always "just" deposit the reflection and model mmCIF files coming out of refinement (any package) - with certain significant limitations: