[buster-discuss] Erroneous OneDep message about "missing freeR set" during deposition

BUSTER Developers buster-develop at globalphasing.com
Mon Apr 20 12:42:18 CEST 2026


Dear BUSTER users,

The attached OneDep message is causing our users a lot of headache. Here is
our assessment of it:

 (1) The message is triggered by a _refln loop within a data block not
     having a _refln.status [1] item.

 (2) The reflection data file we provide contains multiple datablocks -
     most are for merged reflection data (_refln category [2]), but also
     one for unmerged reflection data (_diffrn_refln category, [3]).

 (3) The message text is slightly misleading and unclear for users, since
     all our merged datablocks contain a test-set flag [4].

     ==> Remember that there is a difference between a "test-set flag"
         (assigning a reflection to one of N test-sets) and the "free-set"
         (setting one of those test-sets aside from refinement to compute
         e.g. R-free - in which case one could call it the
         "rfree-set"). Precise nomenclature and correct wording/usage
         matters.

 (4) Our provided multi-datablock reflection data file will contain data
     directly from the data processing step, plus the data that went into
     refinement and the data that came out of refinement (e.g. map
     coefficients).

     Data from data processing can be richer and more varied than what was
     finally used in refinement (e.g. F(early) and F(late) amplitudes for
     radiation damage detection maps).

 (5) The _refln.status [1] item is defined as

       Classification of a reflection so as to indicate its status with
       respect to inclusion in the refinement and the calculation of R
       factors.

 (6) As we see it, what is causing misunderstanding on the wwPDB OneDep
     side and causing problems to our users is as follows:

     (a) At the point of data-processing there is no knowledge of whether
         (i) a specific reflection will be used in the final refinement
         step (before deposition), or (ii) if it is included, how it will
         be used for the computation of R factors (R_work, R_free etc).

     (b) The test-set flags (usually integer numbers between 0 and N (where
         N is often 19, leading to 5% of reflections in each test-set) as
         assigned during data processing can not necessarily be assumed to
         be the same that are actually used during final refinements. They
         might for instance have been auto-assigned during automatic
         processing at a synchrotron site and then replaced with a set of
         reference test-set flags (often computed to a very high resolution
         to cover any upcoming dataset collected e.g. during a large scale
         fragment screening campaign).
 
     (c) Even if they were identical, the selection of a particular
         test-set as the free set is done at the point of refinement
         (selecting e.g. test-set 0 as the free-set, or test-set 1, or
         another one - or even doing complete cross-validation by using
         each test-set in turn).
 
     (d) Refinement will often use a susbset of reflections (low or high
         resolution limits, outlier rejection etc) so that the status flag
         as defined above can only be uniquely assigned to the output file
         of the last refinement step.
 
         ==> This is why BUSTER will place that data as the first datablock
             of the multi-datablock reflection file.
 
The official wwPDB documentation at [5] has
 
  Structure Factors
   
   * The structure factor file can either be in mmCIF or mtz formats and
     should include h, k, l, F, SigmaF (and/or I and SigmaI) and test
     flags.
 
   * Definitions and format are available at
     http://wwpdb.org/documentation/procedure#toc_appendicsB
 
 
 ==> F+SigmaF or I+SigmaI are not "structuer factors" [6], they are
     "structure factor amplitudes" or "intensities" (plus associated
     standard uncertainties). Therefore, the reflection data file
     associated with a PDB entry from a diffraction experiment can't be
     (and should never have been called) called a "structure factor file".
 
 ==> "test flags" are a way to associate a given merged reflection to a
     particular test set (often numbered 0 to 19 to have 5% of reflections
     in each test set) - without giving us information what those test sets
     are used for. We need an additional "rfree flag" per reflection to
     tell us for the sole purpose of computing Rwork and Rfree which
     reflection were used for which. Most refinement programs will actually
     use the "test flag" and a parameter (e.g. "0" or "1") that indicates
     that all reflections with a test set flag of that value will be used
     for Rfree computation, while all others go into the Rwork computation.
 
 ==> The URL [7] doesn't contain any definitions or format descriptions -
     only a few examples.
 
Therefore:
 
 * It does not make sense to require that every merged datablock in a
   reflection data mmCIF file should carry a _refln.status item in a
   _refln loop.
 
 * Users should not be required to transfer refinement-specific information
   (like that _refln.status item) back onto original reflection data from
   data processing - especially because
 
   (1) this is nearly impossible to do correctly and completely (see
       above), and
 
   (2) the wwPDB sites are currently not doing anything with those
       additional datablocks anyway as far as we know.
 
This incorrect warning/error message from OneDep has caused a lot of
problems over several years and we have tried to support users on a
case-by-case basis to get the data through the annotation process
unchanged. Unfortunately, this has not triggered a fix in the OneDep system
as-is. We feel very strongly that the bogus check (_refln.status in
datablocks N>1) should be removed from OneDep as a matter of urgency: it is
preventing a lot of our users from depositing metadata-rich, multi-
datablock reflection data, whereas the prevailing Zeitgeist is that
depositions should be as "rich" as possible to help train Machine Learning
engines.

Kind regards

The Global Phasing Developers


PS: see also
    https://www.globalphasing.com/buster/wiki/admin/index.cgi?DepositionMmCifFaq
    https://www.globalphasing.com/buster/wiki/admin/index.cgi?DepositionMmCif


[1] https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_refln.status.html
[2] https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/refln.html
[3] https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/diffrn_refln.html
[4] https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_refln.pdbx_r_free_flag.html
[5] https://www.wwpdb.org/documentation/policy
    https://cdn.rcsb.org/wwpdb/docs/documentation/annotation/wwPDB-A-2025Mar-V5.5.pdf
[6] https://dictionary.iucr.org/Structure_factor
[7] http://wwpdb.org/documentation/procedure#toc_appendicsB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot_2026-04-15_15-53-12.png
Type: image/png
Size: 89307 bytes
Desc: not available
URL: <http://www.globalphasing.com/pipermail/buster-discuss/attachments/20260420/41249a2e/attachment-0001.png>


More information about the buster-discuss mailing list