WORK-IN-PROGRESS

Content:


Introduction

We will be working in a separate directory - and create two subdirectories in there (one for the deposited data and one as a work directory): you should be able to just cut-and-paste the commands given in these green code blocks.


Getting deposited (raw) data

With a series of shell commands

  mkdir Deposited
  cd Deposited

  # Raw diffraction data
  # use "curl -O" instead of "wget" if the latter is not available
  wget -q "https://data.proteindiffraction.org/other/8agq_zip_8AGQ.tar.bz2"
  tar -xf 8agq_zip_8AGQ.tar.bz2
  ln -s 8agq_zip_8AGQ/data Images

  # MR search model
  # use "curl -O" instead of "wget" if the latter is not available
  wget -q "https://files.rcsb.org/download/5F07.pdb"
  egrep "^LINK|^SSBOND|^CRYST1|^ATOM|^HETATM|^ANISOU|^TER" 5F07.pdb  > start.pdb

  # currently deposited model (for comparison):
  fetch_PDB_gemmi 8AGQ | tee fetch_PDB_gemmi.log
  ln -s r8agqsf.mtz deposited.mtz
  ln -s 8agq.pdb deposited.pdb

  cd ..

we should now have all relevant (deposited) data in the subdirectory Deposited.


Setting up working directory

Running

  mkdir Work
  cd Work

  # create some symbolic links for files to be used here
  ln -s ../Deposited/Images .
  ln -s ../Deposited/start.pdb .
  ln -s ../Deposited/deposited.pdb .
  ln -s ../Deposited/deposited.mtz .

  # create pseudo APO model
  egrep "^LINK|^SSBOND|^CRYST1|^ATOM|^HETATM|^ANISO|^TER" 8agq.pdb | egrep -v "^HETATM.*M5O|^ANISOU.*M5O" > apo.pdb

  # getting some Grade2 restraint dictionaries
  # use "curl -O" instead of "wget" if the latter is not available
  wget https://www.globalphasing.com/buster/wiki/plugin/attachments/GPhLTutorials8AGQ/GSH.restraints.cif
  wget https://www.globalphasing.com/buster/wiki/plugin/attachments/GPhLTutorials8AGQ/M5O.restraints.cif

will get us all required files:

  • Images: directory with raw diffraction data
  • start.pdb: starting model (since in the same crystal form as the new data, no molecular replacement calculation is necessary)
  • deposited.mtz: deposited reflection data (to be used as a reference MTZ file in order to get the same test-set flags as the deposited model)
  • deposited.pdb: deposited model (for comparison)
  • GSH.restraints.cif and M5O.restraints.cif: restraint dictionaries

Computing a quick and simple set of BUSTER maps

To check if our software is correctly set up and configured, we can run

  buster_maponly -p 8agq.pdb -m 8agq.mtz -o maponly.mtz | tee maponly.log

to get a set of map coefficients in the maponly.mtz output file:

  • 2FOFCWT/PH2FOFCWT: 2mFo-DFc electron density map using all observations (FP/SIGFP columns)
  • 2FOFCWT_iso-fill/PH2FOFCWT_iso-fill: 2mFo-DFc electron density map using all reflections to the highest resolution limit of any observation (FP/SIGFP columns), i.e. any missing reflection within that isotropic cut-off surface (a sphere) will be set to DFc values (model-only).
  • FOFCWT/PHFOFCWT: difference density

Re-processing raw diffraction data with autoPROC

This can be done in three different ways.

Without any prior information

 
  process -I Images -d process.01 | tee process.01.lis

This has the advantage of avoiding any kind of bias towards the actual crystal form (cell/SG) of that dataset: sometimes the crystal form changes e.g. due to a co-crystallisation with a particular compound. However, if this dataset comes in previously observed crystal form, the associated cell/SG is not enforced - which can become relevant when not all screw-axes are observed (so processing derives at P21212 just because the last 2-fold screw was not measured) or when a particular setting is required.

If working within a larger project with several datasets, the correct SG/cell might have to be manually set/adjusted and a consistent set of test-set flags needs to be established afterwards.

With known cell and spacegroup information

 
  process -I Images cell="90 55 55 90 114 90" symm=C2 -d process.02 | tee process.02.lis

This will enforce a specific cell/SG already at the indexing stage, which can be problematic if e.g. the unit-cell has doubled and the discarding of half of the (maybe weak) spots is thus enforced.

The test-set flags automatically created at the end of processing will not be consistent with any previously determined model/dataset (see use of check_indexing and add_freerflag.sh below).

With reference reflection data of known cell and spacegroup

 
  process -I Images -ref 8agq.mtz -d process.03 | tee process.03.lis

Here we not only enforce a specific cell/SG (extracted from the MTZ file header: see e.g. mtzana 8agq.mtz or gemmi mtz 8agq.mtz output), but will also ensure that the newly processed data is consistently indexed (for spacegroups that allow different equivalent indexing solutions). Finally, the same set of test-set flags will be used and (if necessary) extended to the full resolution limits of the newly processed data.

This mode of processing is recommended when working within the same project and crystal form on a large number of datasets.


Checking symmetry/indexing (if required)

If no reference MTZ file was provided during data processing, the newly processed reflection data might require some transformations in relation to existing reference data. This can be done e.g. via

  check_indexing -v 8agq.mtz process.01/staraniso_alldata-unique.mtz
  check_indexing -v 8agq.mtz process.02/staraniso_alldata-unique.mtz

In the case here (C2 symmtry) nothing very interesting will happen: there are no alternative indexing possibilities, potentially missed screw axes or different settings.

Of course, there are other programs available to perform similar checks (see also our own aP_select_pdb for a variant): the important point is that any newly processed data should be consistent with any previously available data of the same crystal form to avoid confusion solely because of some trivial re-indexing or difference in settings.


Consistency of test-set flags

Within a larger project, the test-set flags for a given crystal form should be the same across different datasets. This ensures that Rfree-values computed even after a minimal amount of refinement are meaningful and not biased. We provide a simple tool (that internally runs the usual CCP4 programs which should be run after any indexing/setting ambiguity has been resolved:

  add_freerflag.sh -f 8agq.mtz -m process.01/staraniso_alldata-unique.mtz
  add_freerflag.sh -f 8agq.mtz -m process.02/staraniso_alldata-unique.mtz

If the reference file 8agq.mtz was less complete (e.g. lower resolution) than the MTZ file, the test-set flag will be extended. This would happen afresh for every new dataset handled this way - which is why the creation of a highly-optimisitc reference MTZ file (with test-set flags to the highest resolution envisaged) wold be a good idea.


Fully automatic refinement

The aB_autorefine interface to BUSTER will run a series of individual BUSTER refinement jobs with some automatic decision making in between (when to use TLS, ADP, solvent model update, outlier rejection etc). This is not a quick process, but provides a very consistent and reliable way of getting a large number of structures to a reasnoable state for subsequent analysis and manual adjustment.

  aB_autorefine -p search.pdb -m process.03/staraniso_alldata-unique.mtz -d refine.01 | tee refine.01.log

Single BUSTER refinement run

A single BUSTER refinement run can be run using

  refine -p search.pdb -m process.03/staraniso_alldata-unique.mtz -l GSH.restraints.cif -d refine.02 | tee refine.02.log

There are a lot of options available (see refine -h for details) for fine-tuning the behaviour, although we thing that the default should be adequate for most situations. Please note that this job will internally run BUSTER several times (so-called "big cycles", with updates to the solvent mask and the X-Ray weighting) and that before the last of those cycles a feature called "void correction" is activated (to account for cavities that should be excluded from the bulk solvent model).


Single BUSTER refinement run for ligand detection

This uses the so-called -L feature (Vonrhein, C. and Bricogne, G., 2005. Automated Structure Refinement for High-Throughput Ligand Detection with BUSTER-TNT. Acta Cryst. Sect A, 61, p.c248.) as described in more detail here. The "Polder maps" in Phenix follow a similar idea.

  refine -L -p search.pdb -m process.03/staraniso_alldata-unique.mtz -l GSH.restraints.cif -d refine.03 | tee refine.03.log

The result should hopefully be a clearer difference-density map for the bound ligand.


Ligand restraint generation using Grade2

We will need a description of the ligand in form of a mmCIF restraints dictionary before we can try and fit the ligand into the (hopefully clear) difference density map that provides our evidence for a bound compound. There are several ways of generating those type of restraint files (and the different programs should create files that are compatible with each other), with Grade2 the latest incarnation of our own approach to this. If you have the required CSD software from CCDC installed, you should be able to run

  grade2 -P M5O

(for a known identifier in the chemical components dictionary), or

  grade2 -r LIG 'Oc1cc(O)c2c(c1)O[C@@H](c1ccc(O)c(O)c1)[C@H](O)C2'

if using a SMILES string. Please also see the extensive Grade2 documentation and the Grade2 webserver.


Ligand fitting using RhoFit


Fully automatic data processing, model refinement and ligand fitting with Pipedream