This document is still work-in-progress - any info about errors etc highly appreciated!
Here we are going to look at data for
(see also here for crystallisation info) since the raw diffraction data has also been deposited at
Lehtiƶ, L., & Maksimainen, M. (2021). PARP15 crystal structures with OUL35 inhibitor analogs (Version 1). University of Oulu. https://doi.org/10.23729/ba2c02ed-340f-4047-94c5-81ccf11100f9 |
We concentrate on a few of those datasets, namely:
PDB | Ligand | Ligand smiles | Grade ligand restraints |
7PWC | 8AE | c1cc(ccc1C(=O)N)OCC2CCC2 | 8AE.grade_PDB_ligand.cif |
7PWK | 89Q | c1cc(ccc1C(=O)N)OCC2CC2 | 89Q.grade_PDB_ligand.cif |
7PWM | 8AQ | COc1cc(ccc1C(=O)N)OCC2CCCCC2 | 8AQ.grade_PDB_ligand.cif |
7PWQ | 8A8 | c1cc2c(cc1OCC3CCCCC3)C(=O)N=NC2=O | 8A8.grade_PDB_ligand.cif |
7PX6 | 8D2 | c1ccc(c(c1)COc2ccc3c(c2)C(=O)N=NC3=O)F | 8D2.grade_PDB_ligand.cif |
In the explanations below we are showing commands for all those datasets (so you could run all of them one after each other). During the actual workshop tutorials we should run only one of those datasets of course - so that we are not getting confused and/or run out of time ...
We can use the deposited diffraction data (already downloaded to /data/rapidata2/rapidata2022/Tutorials/Friday/Buster directory on the SSRL computers) with autoPROC using commands like the below:
process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWC -d 7PWC_autoPROC.01 | tee 7PWC_autoPROC.01.lis process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWK -d 7PWK_autoPROC.01 | tee 7PWK_autoPROC.01.lis process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWM -d 7PWM_autoPROC.01 | tee 7PWM_autoPROC.01.lis process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWQ -d 7PWQ_autoPROC.01 | tee 7PWQ_autoPROC.01.lis process -I /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PX6 -d 7PX6_autoPROC.01 | tee 7PX6_autoPROC.01.lis
We've already processed the data, so you could look at the results directly via
There are also files ready for refinement (against the APO reference structure 3BLJ) that have the correct set of test-set flags added. This was done via
fetch_PDB 3BLJ add_freerflag.sh -f 3BLJ/3blj.mtz -m 7PWC_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWC_autoPROC.01/staraniso_use.mtz add_freerflag.sh -f 3BLJ/3blj.mtz -m 7PWK_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWK_autoPROC.01/staraniso_use.mtz add_freerflag.sh -f 3BLJ/3blj.mtz -m 7PWM_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWM_autoPROC.01/staraniso_use.mtz add_freerflag.sh -f 3BLJ/3blj.mtz -m 7PWQ_autoPROC.01/staraniso_alldata-unique.mtz -o 7PWQ_autoPROC.01/staraniso_use.mtz add_freerflag.sh -f 3BLJ/3blj.mtz -m 7PX6_autoPROC.01/staraniso_alldata-unique.mtz -o 7PX6_autoPROC.01/staraniso_use.mtz
We could've saved us this step by running autoPROC already with the 3BLJ/3blj.mtz reference file (using the -ref flag). However, we wanted to let autoPROC free reign in deciding about the most likely SG/cell. Of course, if the crystal form is very reproducible it would make sens to use that reference MTZ file (see also below about Pipedream).
ID | Resolution 1 | Diffraction limits STARANISO | MTZ file for refinement |
7PWC | 1.39 (1.50) | 1.19 1.32 1.38 | 7PWC_autoPROC.01_staraniso_use.mtz |
7PWK | 1.57 (1.80) | 1.40 1.43 1.56 | 7PWK_autoPROC.01_staraniso_use.mtz |
7PWM | 1.39 (1.35) | 1.21 1.32 1.40 | 7PWM_autoPROC.01_staraniso_use.mtz |
7PWQ | 1.54 (1.70) | 1.37 1.45 1.48 | 7PWQ_autoPROC.01_staraniso_use.mtz |
7PX6 | 1.53 (1.65) | 1.32 1.45 1.50 | 7PX6_autoPROC.01_staraniso_use.mtz |
There is no need for any "structure solution" here, since the APO structure 3BLJ is already in the same crystal form. We first want to remove any buffer molecules that might interfere with the binding site:
mv 3BLJ/3blj.pdb 3BLJ/3blj.pdb_orig grep -v " [AB] 70[123] " 3BLJ/3blj.pdb_orig > 3BLJ/3blj.pdb
Now we only need to start refinement with a rigid-body cycle and can use the following commands
refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.01 | tee 7PWC_BUSTER.01.lis refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.01 | tee 7PWK_BUSTER.01.lis refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.01 | tee 7PWM_BUSTER.01.lis refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.01 | tee 7PWQ_BUSTER.01.lis refine -RB -autoncs -sim_swap_equiv -p 3BLJ/3blj.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.01 | tee 7PX6_BUSTER.01.lis
This results in the following R-values:
ID | Rwork,free (deposited) | Rwork,free 3 |
7PWC | 0.2025 0.2220 | 0.2448 0.2622 |
7PWK | 0.2162 0.2392 | 0.2285 0.2544 |
7PWM | 0.1373 0.1694 | 0.2335 0.2529 |
7PWQ | 0.1852 0.2177 | 0.2407 0.2578 |
7PX6 | 0.1839 0.1970 | 0.2384 0.2572 |
At that point we would probably do some manual model fixing, automatic water update, decide on using hydrogens (or not) or doing ADP refinement etc. All assuming that we want to get the best possible model before starting interpreting the potential ligand binging. This could be done e.g. using the following set of commands:
aB_hydrogenate -p 7PWC_BUSTER.01/refine.pdb -full -o 7PWC_BUSTER.01/refineH.pdb refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWC_BUSTER.01/refineH.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.02 | tee 7PWC_BUSTER.02.lis aB_hydrogenate -p 7PWK_BUSTER.01/refine.pdb -full -o 7PWK_BUSTER.01/refineH.pdb refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWK_BUSTER.01/refineH.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.02 | tee 7PWK_BUSTER.02.lis aB_hydrogenate -p 7PWM_BUSTER.01/refine.pdb -full -o 7PWM_BUSTER.01/refineH.pdb refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWM_BUSTER.01/refineH.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.02 | tee 7PWM_BUSTER.02.lis aB_hydrogenate -p 7PWQ_BUSTER.01/refine.pdb -full -o 7PWQ_BUSTER.01/refineH.pdb refine -autoncs -M ADP -M HydrogenHybridModel -p 7PWQ_BUSTER.01/refineH.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.02 | tee 7PWQ_BUSTER.02.lis aB_hydrogenate -p 7PX6_BUSTER.01/refine.pdb -full -o 7PX6_BUSTER.01/refineH.pdb refine -autoncs -M ADP -M HydrogenHybridModel -p 7PX6_BUSTER.01/refineH.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.02 | tee 7PX6_BUSTER.02.lis
Let's see if those additional steps did any good:
ID | Rwork,free (deposited) | Rwork,free (1) | Rwork,free (2) 4 |
7PWC | 0.2025 0.2220 | 0.2448 0.2622 | 0.2299 0.2546 |
7PWK | 0.2162 0.2392 | 0.2285 0.2544 | 0.2180 0.2488 |
7PWM | 0.1373 0.1694 | 0.2335 0.2529 | 0.2195 0.2377 |
7PWQ | 0.1852 0.2177 | 0.2407 0.2578 | 0.2213 0.2467 |
7PX6 | 0.1839 0.1970 | 0.2384 0.2572 | 0.2238 0.2409 |
We could also generate ligand-free pseudo-APO structures by removing the ligand from those input PDB files:
wget -q https://files.rcsb.org/download/7PWC.pdb grep -v " A 701 " 7PWC.pdb > 7PWC_noLIG.pdb wget -q https://files.rcsb.org/download/7PWK.pdb grep -v " A 701 " 7PWK.pdb > 7PWK_noLIG.pdb wget -q https://files.rcsb.org/download/7PWM.pdb grep -v " A 701 " 7PWM.pdb > 7PWM_noLIG.pdb wget -q https://files.rcsb.org/download/7PWQ.pdb grep -v " A 701 " 7PWQ.pdb > 7PWQ_noLIG.pdb wget -q https://files.rcsb.org/download/7PX6.pdb grep -v " A 701 " 7PX6.pdb > 7PX6_noLIG.pdb
and then run BUSTER using
refine -RB -autoncs -sim_swap_equiv -p 7PWC_noLIG.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.03 | tee 7PWC_BUSTER.03.lis refine -RB -autoncs -sim_swap_equiv -p 7PWK_noLIG.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.03 | tee 7PWK_BUSTER.03.lis refine -RB -autoncs -sim_swap_equiv -p 7PWM_noLIG.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.03 | tee 7PWM_BUSTER.03.lis refine -RB -autoncs -sim_swap_equiv -p 7PWQ_noLIG.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.03 | tee 7PWQ_BUSTER.03.lis refine -RB -autoncs -sim_swap_equiv -p 7PX6_noLIG.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.03 | tee 7PX6_BUSTER.03.lis
This gives us this set of R-values:
ID | Rwork,free (deposited) | Rwork,free |
7PWC | 0.2025 0.2220 | 0.2393 0.2557 |
7PWK | 0.2162 0.2392 | 0.2215 0.2406 |
7PWM | 0.1373 0.1694 | 0.2181 0.2288 |
7PWQ | 0.1852 0.2177 | 0.2319 0.2486 |
7PX6 | 0.1839 0.1970 | 0.2252 0.2365 |
Let's assume we have now a set of (reasonably) well refined models where there is still the ligand missing. We now want to fit each ligand into the difference density that should be describe the location and conformation correctly. What we need is a mmCIF restraint dictionary for each compound - which we can generate e.g. using the Grade webserver. We can use the SMILES string of each compound or (for this tutorial) the already assigned PDB ligand identifier (3-letter code).
We've already done those runs on the Grade webserver with the following results:
PDB | Ligand | Grade server run | SMILES | CIF dictionary 5 |
7PWC | 8AE | 1652436856njSvYnH7KJK | c1cc(ccc1C(=O)N)OCC2CCC2 | 8AE.grade_PDB_ligand.cif |
7PWK | 89Q | 16524368762DrUWvUqp7E | c1cc(ccc1C(=O)N)OCC2CC2 | 89Q.grade_PDB_ligand.cif |
7PWM | 8AQ | 1652436896yiC4EHVU1OZ | COc1cc(ccc1C(=O)N)OCC2CCCCC2 | 8AQ.grade_PDB_ligand.cif |
7PWQ | 8A8 | 1652436918JJaUiDFijyV | c1cc2c(cc1OCC3CCCCC3)C(=O)N=NC2=O | 8A8.grade_PDB_ligand.cif |
7PX6 | 8D2 | 1652436942J5CtDj7LfZK | c1ccc(c(c1)COc2ccc3c(c2)C(=O)N=NC3=O)F | 8D2.grade_PDB_ligand.cif |
To run Rhofit (see also rhofit -h output) we need to give that dictionary and the results (model and reflection data) from refinement:
rhofit -l 8AE.grade_PDB_ligand.cif -p 7PWC_BUSTER.03/refine.pdb -m 7PWC_BUSTER.03/refine.mtz -d 7PWC_BUSTER.03-rhofit.01 | tee 7PWC_BUSTER.03-rhofit.01.lis rhofit -l 89Q.grade_PDB_ligand.cif -p 7PWK_BUSTER.03/refine.pdb -m 7PWK_BUSTER.03/refine.mtz -d 7PWK_BUSTER.03-rhofit.01 | tee 7PWK_BUSTER.03-rhofit.01.lis rhofit -l 8AQ.grade_PDB_ligand.cif -p 7PWM_BUSTER.03/refine.pdb -m 7PWM_BUSTER.03/refine.mtz -d 7PWM_BUSTER.03-rhofit.01 | tee 7PWM_BUSTER.03-rhofit.01.lis rhofit -l 8A8.grade_PDB_ligand.cif -p 7PWQ_BUSTER.03/refine.pdb -m 7PWQ_BUSTER.03/refine.mtz -d 7PWQ_BUSTER.03-rhofit.01 | tee 7PWQ_BUSTER.03-rhofit.01.lis rhofit -l 8D2.grade_PDB_ligand.cif -p 7PX6_BUSTER.03/refine.pdb -m 7PX6_BUSTER.03/refine.mtz -d 7PX6_BUSTER.03-rhofit.01 | tee 7PX6_BUSTER.03-rhofit.01.lis
(You could add the command -redocc 0.99 to define the occupancy to be given to any fitted ligand - this can be useful for triggering occupancy refinement later on, using the pdb2occ tool and BUSTER refine).
After such a run we would like to see the quality of the fit. For that we can run
cd 7PWC_BUSTER.03-rhofit.01 visualise-rhofit-coot cd 7PWK_BUSTER.03-rhofit.01 visualise-rhofit-coot cd 7PWM_BUSTER.03-rhofit.01 visualise-rhofit-coot cd 7PWQ_BUSTER.03-rhofit.01 visualise-rhofit-coot cd 7PX6_BUSTER.03-rhofit.01 visualise-rhofit-coot
There are then tools available for selecting the best solution from a selection - but if one is happy with the top solution, a subsequent BUSTER refinement (including occupancy refinement) could look like this:
pdb2occ -p 7PWC_BUSTER.03-rhofit.01/merged.pdb -o 7PWC_BUSTER.03-rhofit.01/merged.occ refine \ -autoncs \ -Gelly 7PWC_BUSTER.03-rhofit.01/merged.occ \ -p 7PWC_BUSTER.03-rhofit.01/merged.pdb -m 7PWC_autoPROC.01/staraniso_use.mtz -d 7PWC_BUSTER.04 | tee 7PWC_BUSTER.04.lis pdb2occ -p 7PWK_BUSTER.03-rhofit.01/merged.pdb -o 7PWK_BUSTER.03-rhofit.01/merged.occ refine \ -autoncs \ -Gelly 7PWK_BUSTER.03-rhofit.01/merged.occ \ -p 7PWK_BUSTER.03-rhofit.01/merged.pdb -m 7PWK_autoPROC.01/staraniso_use.mtz -d 7PWK_BUSTER.04 | tee 7PWK_BUSTER.04.lis pdb2occ -p 7PWM_BUSTER.03-rhofit.01/merged.pdb -o 7PWM_BUSTER.03-rhofit.01/merged.occ refine \ -autoncs \ -Gelly 7PWM_BUSTER.03-rhofit.01/merged.occ \ -p 7PWM_BUSTER.03-rhofit.01/merged.pdb -m 7PWM_autoPROC.01/staraniso_use.mtz -d 7PWM_BUSTER.04 | tee 7PWM_BUSTER.04.lis pdb2occ -p 7PWQ_BUSTER.03-rhofit.01/merged.pdb -o 7PWQ_BUSTER.03-rhofit.01/merged.occ refine \ -autoncs \ -Gelly 7PWQ_BUSTER.03-rhofit.01/merged.occ \ -p 7PWQ_BUSTER.03-rhofit.01/merged.pdb -m 7PWQ_autoPROC.01/staraniso_use.mtz -d 7PWQ_BUSTER.04 | tee 7PWQ_BUSTER.04.lis pdb2occ -p 7PX6_BUSTER.03-rhofit.01/merged.pdb -o 7PX6_BUSTER.03-rhofit.01/merged.occ refine \ -autoncs \ -Gelly 7PX6_BUSTER.03-rhofit.01/merged.occ \ -p 7PX6_BUSTER.03-rhofit.01/merged.pdb -m 7PX6_autoPROC.01/staraniso_use.mtz -d 7PX6_BUSTER.04 | tee 7PX6_BUSTER.04.lis
We are using our pdb2occ tool to create a text file with so-called Gelly commands that describe a particular parametrisation related to occupancy refinement.
As one can see, there is fair amount of system in the above steps: data processing, refinement of APO structure, ligand detection and fitting. This is the reasong behind our Pipedream package that tries and automate and combine the typical steps encountered during ligand screening campaigns. For full details please see the
pipedream -h
as well as the Pipedream manual and the Pipedream reference card.
We have two entry points into Pipedream: (1) starting from raw diffraction data or (2) starting from already processed compound data. Since we are not sure about the quality of the APO model we're starting with (and how isomorphous it is to the datasets we are handling here 6 ), this approach might not work very well here ... but for completeness and to show you the priniple, let's start with the latter approach:
pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -hklin 7PWC_autoPROC.01/staraniso_use.mtz \ -remediate \ -rhofit 8AE.grade_PDB_ligand.cif \ -d 7PWC_pipedream.01 | tee 7PWC_pipedream.01.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -hklin 7PWK_autoPROC.01/staraniso_use.mtz \ -remediate \ -rhofit 89Q.grade_PDB_ligand.cif \ -d 7PWK_pipedream.01 | tee 7PWK_pipedream.01.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -hklin 7PWM_autoPROC.01/staraniso_use.mtz \ -remediate \ -rhofit 8AQ.grade_PDB_ligand.cif \ -d 7PWM_pipedream.01 | tee 7PWM_pipedream.01.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -hklin 7PWQ_autoPROC.01/staraniso_use.mtz \ -remediate \ -rhofit 8A8.grade_PDB_ligand.cif \ -d 7PWQ_pipedream.01 | tee 7PWQ_pipedream.01.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -hklin 7PX6_autoPROC.01/staraniso_use.mtz \ -remediate \ -rhofit 8D2.grade_PDB_ligand.cif \ -d 7PX6_pipedream.01 | tee 7PX6_pipedream.01.lis
Most of these input parameters should be self-explanatory (see also pipedream -h output):
If you are in a hurry: use -quick instead of -remediate (at the danger of it not working at all - it all depends on (1) the quality of your APO model, (2) the quality of your compound data and (3) the strength of the ligand binding).
A second way is to start directly from the raw diffraction images and let autoPROC do the integration directly:
pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWC \ -remediate \ -rhofit 8AE.grade_PDB_ligand.cif \ -d 7PWC_pipedream.02 | tee 7PWC_pipedream.02.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWK \ -remediate \ -rhofit 89Q.grade_PDB_ligand.cif \ -d 7PWK_pipedream.02 | tee 7PWK_pipedream.02.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWM \ -remediate \ -rhofit 8AQ.grade_PDB_ligand.cif \ -d 7PWM_pipedream.02 | tee 7PWM_pipedream.02.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PWQ \ -remediate \ -rhofit 8A8.grade_PDB_ligand.cif \ -d 7PWQ_pipedream.02 | tee 7PWQ_pipedream.02.lis pipedream -xyzin 3BLJ/3blj.pdb \ -hklref 3BLJ/3blj.mtz \ -imagedir /data/rapidata2/rapidata2022/Tutorials/Friday/Buster/7PX6 \ -remediate \ -rhofit 8D2.grade_PDB_ligand.cif \ -d 7PX6_pipedream.02 | tee 7PX6_pipedream.02.lis
We only changed -hklin to -imagedir ... but it will run obviously longer (since the full intergation part is also done).