This document is still work-in-progress - any info about errors etc highly appreciated!

Here we are going to look at data for

Nizi, Maria Giulia, et al. "Potent 2, 3-dihydrophthalazine-1, 4-dione derivatives as dual inhibitors for mono-ADP-ribosyltransferases PARP10 and PARP15." European Journal of Medicinal Chemistry (2022): 114362.

(see also here for crystallisation info) since the raw diffraction data has also been deposited at

Lehtiƶ, L., & Maksimainen, M. (2021). PARP15 crystal structures with OUL35 inhibitor analogs (Version 1). University of Oulu.

We concentrate on a few of those datasets, namely:

PDB Ligand Ligand smiles Grade ligand restraints
7PWC 8AE c1cc(ccc1C(=O)N)OCC2CCC2 8AE.grade_PDB_ligand.cif
7PWK 89Q c1cc(ccc1C(=O)N)OCC2CC2 89Q.grade_PDB_ligand.cif
7PWM 8AQ COc1cc(ccc1C(=O)N)OCC2CCCCC2 8AQ.grade_PDB_ligand.cif
7PWQ 8A8 c1cc2c(cc1OCC3CCCCC3)C(=O)N=NC2=O 8A8.grade_PDB_ligand.cif
7PX6 8D2 c1ccc(c(c1)COc2ccc3c(c2)C(=O)N=NC3=O)F 8D2.grade_PDB_ligand.cif

In the explanations below we are showing commands for all those datasets (so you could run all of them one after each other). During the actual workshop tutorials we should run only one of those datasets of course - so that we are not getting confused and/or run out of time ...

Data processing with autoPROC/XDS+STARANISO

We can use the deposited diffraction data (already downloaded to /data/rapidata2/rapidata2022/Tutorials/Friday/Buster directory on the SSRL computers) with autoPROC using commands like the below:

    process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWC -d 7PWC_autoPROC.01 | tee 7PWC_autoPROC.01.lis
    process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWK -d 7PWK_autoPROC.01 | tee 7PWK_autoPROC.01.lis
    process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWM -d 7PWM_autoPROC.01 | tee 7PWM_autoPROC.01.lis
    process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWQ -d 7PWQ_autoPROC.01 | tee 7PWQ_autoPROC.01.lis
    process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PX6 -d 7PX6_autoPROC.01 | tee 7PX6_autoPROC.01.lis

We've already processed the data, so you could look at the results directly via

There are also files ready for refinement (against the APO reference structure 3BLJ) that have the correct set of test-set flags added. This was done via

    fetch_PDB 3BLJ -f 3BLJ/3blj.mtz -m 7PWC_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWC_autoPROC.01/staraniso_use.mtz -f 3BLJ/3blj.mtz -m 7PWK_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWK_autoPROC.01/staraniso_use.mtz -f 3BLJ/3blj.mtz -m 7PWM_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWM_autoPROC.01/staraniso_use.mtz -f 3BLJ/3blj.mtz -m 7PWQ_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWQ_autoPROC.01/staraniso_use.mtz -f 3BLJ/3blj.mtz -m 7PX6_autoPROC.01/staraniso_alldata-unique.mtz -o 7PX6_autoPROC.01/staraniso_use.mtz

We could've saved us this step by running autoPROC already with the 3BLJ/3blj.mtz reference file (using the -ref flag). However, we wanted to let autoPROC free reign in deciding about the most likely SG/cell. Of course, if the crystal form is very reproducible it would make sens to use that reference MTZ file (see also below about Pipedream).

ID Resolution 1 Diffraction limits STARANISO MTZ file for refinement
7PWC 1.39 (1.50) 1.19 1.32 1.38 7PWC_autoPROC.01_staraniso_use.mtz
7PWK 1.57 (1.80) 1.40 1.43 1.56 7PWK_autoPROC.01_staraniso_use.mtz
7PWM 1.39 (1.35) 1.21 1.32 1.40 7PWM_autoPROC.01_staraniso_use.mtz
7PWQ 1.54 (1.70) 1.37 1.45 1.48 7PWQ_autoPROC.01_staraniso_use.mtz
7PX6 1.53 (1.65) 1.32 1.45 1.50 7PX6_autoPROC.01_staraniso_use.mtz

Refinement of APO structure against each dataset using BUSTER

There is no need for any "structure solution" here, since the APO structure 3BLJ is already in the same crystal form. We first want to remove any buffer molecules that might interfere with the binding site:

    mv 3BLJ/3blj.pdb 3BLJ/3blj.pdb_orig
    grep -v " [AB] 70[123] " 3BLJ/3blj.pdb_orig > 3BLJ/3blj.pdb

Now we only need to start refinement with a rigid-body cycle and can use the following commands

    refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.01 | tee 7PWC_BUSTER.01.lis
    refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.01 | tee 7PWK_BUSTER.01.lis
    refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.01 | tee 7PWM_BUSTER.01.lis
    refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.01 | tee 7PWQ_BUSTER.01.lis
    refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.01 | tee 7PX6_BUSTER.01.lis
  • We always switch on the use of LSSR restraints for NCS - and use -sim_swap_equiv to correct any potential mixup with symmetrical side-chains 2 .

This results in the following R-values:

ID Rwork,free (deposited) Rwork,free 3
7PWC 0.2025 0.2220 0.2448 0.2622
7PWK 0.2162 0.2392 0.2285 0.2544
7PWM 0.1373 0.1694 0.2335 0.2529
7PWQ 0.1852 0.2177 0.2407 0.2578
7PX6 0.1839 0.1970 0.2384 0.2572

At that point we would probably do some manual model fixing, automatic water update, decide on using hydrogens (or not) or doing ADP refinement etc. All assuming that we want to get the best possible model before starting interpreting the potential ligand binging. This could be done e.g. using the following set of commands:

    aB_hydrogenate -p 7PWC_BUSTER.01/refine.pdb -full -o 7PWC_BUSTER.01/refineH.pdb
    refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWC_BUSTER.01/refineH.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.02 | tee 7PWC_BUSTER.02.lis

    aB_hydrogenate -p 7PWK_BUSTER.01/refine.pdb -full -o 7PWK_BUSTER.01/refineH.pdb
    refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWK_BUSTER.01/refineH.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.02 | tee 7PWK_BUSTER.02.lis

    aB_hydrogenate -p 7PWM_BUSTER.01/refine.pdb -full -o 7PWM_BUSTER.01/refineH.pdb
    refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWM_BUSTER.01/refineH.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.02 | tee 7PWM_BUSTER.02.lis

    aB_hydrogenate -p 7PWQ_BUSTER.01/refine.pdb -full -o 7PWQ_BUSTER.01/refineH.pdb
    refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWQ_BUSTER.01/refineH.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.02 | tee 7PWQ_BUSTER.02.lis

    aB_hydrogenate -p 7PX6_BUSTER.01/refine.pdb -full -o 7PX6_BUSTER.01/refineH.pdb
    refine -autoncs -M ADP -M HydrogenHybridModel -p 7PX6_BUSTER.01/refineH.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.02 | tee 7PX6_BUSTER.02.lis

Let's see if those additional steps did any good:

ID Rwork,free (deposited) Rwork,free (1) Rwork,free (2) 4
7PWC 0.2025 0.2220 0.2448 0.2622 0.2299 0.2546
7PWK 0.2162 0.2392 0.2285 0.2544 0.2180 0.2488
7PWM 0.1373 0.1694 0.2335 0.2529 0.2195 0.2377
7PWQ 0.1852 0.2177 0.2407 0.2578 0.2213 0.2467
7PX6 0.1839 0.1970 0.2384 0.2572 0.2238 0.2409

Alternative: refinement of pseudo-APO structures

We could also generate ligand-free pseudo-APO structures by removing the ligand from those input PDB files:

    wget -q
    grep -v " A 701 " 7PWC.pdb > 7PWC_noLIG.pdb

    wget -q
    grep -v " A 701 " 7PWK.pdb > 7PWK_noLIG.pdb

    wget -q
    grep -v " A 701 " 7PWM.pdb > 7PWM_noLIG.pdb

    wget -q
    grep -v " A 701 " 7PWQ.pdb > 7PWQ_noLIG.pdb

    wget -q
    grep -v " A 701 " 7PX6.pdb > 7PX6_noLIG.pdb

and then run BUSTER using

    refine -RB -autoncs -sim_swap_equiv -p 7PWC_noLIG.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.03 | tee 7PWC_BUSTER.03.lis
    refine -RB -autoncs -sim_swap_equiv -p 7PWK_noLIG.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.03 | tee 7PWK_BUSTER.03.lis
    refine -RB -autoncs -sim_swap_equiv -p 7PWM_noLIG.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.03 | tee 7PWM_BUSTER.03.lis
    refine -RB -autoncs -sim_swap_equiv -p 7PWQ_noLIG.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.03 | tee 7PWQ_BUSTER.03.lis
    refine -RB -autoncs -sim_swap_equiv -p 7PX6_noLIG.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.03 | tee 7PX6_BUSTER.03.lis

This gives us this set of R-values:

ID Rwork,free (deposited) Rwork,free
7PWC 0.2025 0.2220 0.2393 0.2557
7PWK 0.2162 0.2392 0.2215 0.2406
7PWM 0.1373 0.1694 0.2181 0.2288
7PWQ 0.1852 0.2177 0.2319 0.2486
7PX6 0.1839 0.1970 0.2252 0.2365

Fitting a ligand with Rhofit

Let's assume we have now a set of (reasonably) well refined models where there is still the ligand missing. We now want to fit each ligand into the difference density that should be describe the location and conformation correctly. What we need is a mmCIF restraint dictionary for each compound - which we can generate e.g. using the Grade webserver. We can use the SMILES string of each compound or (for this tutorial) the already assigned PDB ligand identifier (3-letter code).

We've already done those runs on the Grade webserver with the following results:

PDB Ligand Grade server run SMILES CIF dictionary 5
7PWC 8AE 1652436856njSvYnH7KJK c1cc(ccc1C(=O)N)OCC2CCC2 8AE.grade_PDB_ligand.cif
7PWK 89Q 16524368762DrUWvUqp7E c1cc(ccc1C(=O)N)OCC2CC2 89Q.grade_PDB_ligand.cif
7PWM 8AQ 1652436896yiC4EHVU1OZ COc1cc(ccc1C(=O)N)OCC2CCCCC2 8AQ.grade_PDB_ligand.cif
7PWQ 8A8 1652436918JJaUiDFijyV c1cc2c(cc1OCC3CCCCC3)C(=O)N=NC2=O 8A8.grade_PDB_ligand.cif
7PX6 8D2 1652436942J5CtDj7LfZK c1ccc(c(c1)COc2ccc3c(c2)C(=O)N=NC3=O)F 8D2.grade_PDB_ligand.cif

To run Rhofit (see also rhofit -h output) we need to give that dictionary and the results (model and reflection data) from refinement:

    rhofit -l 8AE.grade_PDB_ligand.cif -p 7PWC_BUSTER.03/refine.pdb -m 7PWC_BUSTER.03/refine.mtz -d 7PWC_BUSTER.03-rhofit.01 | tee 7PWC_BUSTER.03-rhofit.01.lis
    rhofit -l 89Q.grade_PDB_ligand.cif -p 7PWK_BUSTER.03/refine.pdb -m 7PWK_BUSTER.03/refine.mtz -d 7PWK_BUSTER.03-rhofit.01 | tee 7PWK_BUSTER.03-rhofit.01.lis
    rhofit -l 8AQ.grade_PDB_ligand.cif -p 7PWM_BUSTER.03/refine.pdb -m 7PWM_BUSTER.03/refine.mtz -d 7PWM_BUSTER.03-rhofit.01 | tee 7PWM_BUSTER.03-rhofit.01.lis
    rhofit -l 8A8.grade_PDB_ligand.cif -p 7PWQ_BUSTER.03/refine.pdb -m 7PWQ_BUSTER.03/refine.mtz -d 7PWQ_BUSTER.03-rhofit.01 | tee 7PWQ_BUSTER.03-rhofit.01.lis
    rhofit -l 8D2.grade_PDB_ligand.cif -p 7PX6_BUSTER.03/refine.pdb -m 7PX6_BUSTER.03/refine.mtz -d 7PX6_BUSTER.03-rhofit.01 | tee 7PX6_BUSTER.03-rhofit.01.lis

(You could add the command -redocc 0.99 to define the occupancy to be given to any fitted ligand - this can be useful for triggering occupancy refinement later on, using the pdb2occ tool and BUSTER refine).

After such a run we would like to see the quality of the fit. For that we can run

cd 7PWC_BUSTER.03-rhofit.01

cd 7PWK_BUSTER.03-rhofit.01

cd 7PWM_BUSTER.03-rhofit.01

cd 7PWQ_BUSTER.03-rhofit.01

cd 7PX6_BUSTER.03-rhofit.01

There are then tools available for selecting the best solution from a selection - but if one is happy with the top solution, a subsequent BUSTER refinement (including occupancy refinement) could look like this:

    pdb2occ -p 7PWC_BUSTER.03-rhofit.01/merged.pdb -o 7PWC_BUSTER.03-rhofit.01/merged.occ
    refine \
      -autoncs \
      -Gelly 7PWC_BUSTER.03-rhofit.01/merged.occ \
      -p 7PWC_BUSTER.03-rhofit.01/merged.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.04 | tee 7PWC_BUSTER.04.lis

    pdb2occ -p 7PWK_BUSTER.03-rhofit.01/merged.pdb -o 7PWK_BUSTER.03-rhofit.01/merged.occ
    refine \
      -autoncs \
      -Gelly 7PWK_BUSTER.03-rhofit.01/merged.occ \
      -p 7PWK_BUSTER.03-rhofit.01/merged.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.04 | tee 7PWK_BUSTER.04.lis

    pdb2occ -p 7PWM_BUSTER.03-rhofit.01/merged.pdb -o 7PWM_BUSTER.03-rhofit.01/merged.occ
    refine \
      -autoncs \
      -Gelly 7PWM_BUSTER.03-rhofit.01/merged.occ \
      -p 7PWM_BUSTER.03-rhofit.01/merged.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.04 | tee 7PWM_BUSTER.04.lis

    pdb2occ -p 7PWQ_BUSTER.03-rhofit.01/merged.pdb -o 7PWQ_BUSTER.03-rhofit.01/merged.occ
    refine \
      -autoncs \
      -Gelly 7PWQ_BUSTER.03-rhofit.01/merged.occ \
      -p 7PWQ_BUSTER.03-rhofit.01/merged.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.04 | tee 7PWQ_BUSTER.04.lis

    pdb2occ -p 7PX6_BUSTER.03-rhofit.01/merged.pdb -o 7PX6_BUSTER.03-rhofit.01/merged.occ
    refine \
      -autoncs \
      -Gelly 7PX6_BUSTER.03-rhofit.01/merged.occ \
      -p 7PX6_BUSTER.03-rhofit.01/merged.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.04 | tee 7PX6_BUSTER.04.lis

We are using our pdb2occ tool to create a text file with so-called Gelly commands that describe a particular parametrisation related to occupancy refinement.

Doing it all in one go via Pipedream}

As one can see, there is fair amount of system in the above steps: data processing, refinement of APO structure, ligand detection and fitting. This is the reasong behind our Pipedream package that tries and automate and combine the typical steps encountered during ligand screening campaigns. For full details please see the

    pipedream -h

as well as the Pipedream manual and the Pipedream reference card.

We have two entry points into Pipedream: (1) starting from raw diffraction data or (2) starting from already processed compound data. Since we are not sure about the quality of the APO model we're starting with (and how isomorphous it is to the datasets we are handling here 6 ), this approach might not work very well here ... but for completeness and to show you the priniple, let's start with the latter approach:

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -hklin 7PWC_autoPROC.01/staraniso_use.mtz \
              -remediate \
              -rhofit 8AE.grade_PDB_ligand.cif \
              -d 7PWC_pipedream.01 | tee 7PWC_pipedream.01.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -hklin 7PWK_autoPROC.01/staraniso_use.mtz \
              -remediate \
              -rhofit 89Q.grade_PDB_ligand.cif \
              -d 7PWK_pipedream.01 | tee 7PWK_pipedream.01.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -hklin 7PWM_autoPROC.01/staraniso_use.mtz \
              -remediate \
              -rhofit 8AQ.grade_PDB_ligand.cif \
              -d 7PWM_pipedream.01 | tee 7PWM_pipedream.01.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -hklin 7PWQ_autoPROC.01/staraniso_use.mtz \
              -remediate \
              -rhofit 8A8.grade_PDB_ligand.cif \
              -d 7PWQ_pipedream.01 | tee 7PWQ_pipedream.01.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -hklin 7PX6_autoPROC.01/staraniso_use.mtz \
              -remediate \
              -rhofit 8D2.grade_PDB_ligand.cif \
              -d 7PX6_pipedream.01 | tee 7PX6_pipedream.01.lis

Most of these input parameters should be self-explanatory (see also pipedream -h output):

  • -xyzin defines the starting (APO) model, which should belong to the same crystal form 7
  • -hklref defines the reflection file corresponding to the starting APO model (in order to transfer test-set flags and also ensure consistent inedxing if the SG allows for that)
  • -hklin refers to the new, already processed reflection data for the compound data
  • -remediate is an extra flag to allow for real-space refinement using PDB_REDO tools 8
  • -rhofit defines the compound dictionary/description
  • -d defines the output directory

If you are in a hurry: use -quick instead of -remediate (at the danger of it not working at all - it all depends on (1) the quality of your APO model, (2) the quality of your compound data and (3) the strength of the ligand binding).

A second way is to start directly from the raw diffraction images and let autoPROC do the integration directly:

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWC \
              -remediate \
              -rhofit 8AE.grade_PDB_ligand.cif \
              -d 7PWC_pipedream.02 | tee 7PWC_pipedream.02.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWK \
              -remediate \
              -rhofit 89Q.grade_PDB_ligand.cif \
              -d 7PWK_pipedream.02 | tee 7PWK_pipedream.02.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWM \
              -remediate \
              -rhofit 8AQ.grade_PDB_ligand.cif \
              -d 7PWM_pipedream.02 | tee 7PWM_pipedream.02.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWQ \
              -remediate \
              -rhofit 8A8.grade_PDB_ligand.cif \
              -d 7PWQ_pipedream.02 | tee 7PWQ_pipedream.02.lis

    pipedream -xyzin 3BLJ/3blj.pdb \
              -hklref 3BLJ/3blj.mtz \
              -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PX6 \
              -remediate \
              -rhofit 8D2.grade_PDB_ligand.cif \
              -d 7PX6_pipedream.02 | tee 7PX6_pipedream.02.lis

We only changed -hklin to -imagedir ... but it will run obviously longer (since the full intergation part is also done).

  • *1: using a traditional, isotropic resolution cut-off where I/sigI in the outer shells starts dropping below a value of 2 (high resolution limit of the deposited structure given in parantheses)
  • *2: when e.g. using Coot it will often offer you to fix the nomenclature of side-chain conformations - but this can destroy the corret NCS relation between side-chain in different copies (which is based on atom naming, even if chemically these are identical)
  • *3: remember that we are using a different refinement program compared to the deposited structures, have re-proecssed the raw data and using anisotropically analysed data - so R/Rfree values become difficult to compare between those different approaches!
  • *4: our model is still lacking at least the ligand compound - and if these are large(ish) molecules and they are good binders, the resuling error due to that missing bit will push R-values up compared to teh deposited full model
  • *5: Other dictionary generators would work here as well - in general, mmCIF restraint dictionaries are fairly compatible between different refinement programs and also within Coot
  • *6: ideally one wants to collect a high-quality APO dataset using the identical protein sample and crystallisation conditions as used for the ligand soaking ... including e.g. a ligand-free DMSO "soaking"
  • *7: Pipedream does not run a full molecular repacement - only a so-called "limited MR" - which has the advantage that test-set flags and location of final results will be consistenc across all runs
  • *8: distributed with [BUSTER] - thanks to the PDB_REDO team!