To setup the environment and copying some of the example data over, please run
source /data/rapidata2/gphl.csh
whenever you connect to one of the processing machines. If everything works as expected, you should then be placed automatically into a directory like
/data/rd2011/GPhL
which would contain several subdirectories with example data (all named according to the PDB identifier).
If you are interested in some of our work related to Covid-19: see here for data processing with autoPROC and our notes regarding refinement with BUSTER.
There are several example datasets available you can use for running autoPROC:
1o22/Images => 90 degree, 1.0 deg/image, CCD 3get/Images => 90 degree, 1.0 deg/image, CCD 3isy/Images => 2 wavelengths (90 degree, 1.0 deg/image), CCD 4hpe/Images => 360 degree, 0.5 deg/image, Pilatus 4j8p/Images => 100 degree, 0.5 deg/image, Pilatus 4jm1/Images => 3 wavelengths, 0.5 deg/image, Pilatus 7jiw/Images => 999 images, 0.3 deg/image, Pilatus
Or have a look at the examples here. Of course, the most interesting would be to use one of your own datasets - if you have any available and can transfer them to SSRL computers. Below are some suggestions on how to run full data-processing on those datasets, but also see
process -h
and
process -M list
The simplest way is to run
process -I /where/ever/image/directory -d out.01 | tee out.01.lis
All output will be written into subdirectory out.01 and standard output is saved into out.01.lis (but also written to the terminal - the tee command does this little trick). The most important output file is out.01/summary.html, so you could also run
process -I /where/ever/image/directory -d out.01 > out.01.lis & firefox out.01/summary.html
A few commonly used options are (... denotes rest of the arguments as described above):
process -M LowResOrTricky ... # difficult data process -M HighResCutOnCChalf ... # isotropic high-resolution limit based # on CC1/2 (instead of I/sig(I)) process -M ScalingX ... # use XSCALE instead of AIMLESS scaling
Those arguments (macros invoked via -M flag) can also be combined.
You should be able to run the following commands for various SAD examples:
process -ANO -M HighResCutOnCChalf -I 1o22/Images -d 1o22_process.01 | tee 1o22_process.01.lis
process -ANO -M HighResCutOnCChalf -I 4hpe/Images -d 4hpe_process.01 | tee 4hpe_process.01.lis
process -ANO -M HighResCutOnCChalf -I 4j8p/Images -d 4j8p_process.01 | tee 4j8p_process.01.lis
process -ANO -M HighResCutOnCChalf -I 7jiw/Images -d 7jiw_process.01 | tee 7jiw_process.01.lis
Instead of waiting for the program to finish, you can open a browser (firefox - the globe icon at the bottom of your desktop) and go to the summary.html file of a particular job, e.g.
/data/rd2099/GPhL/1o22_process.01/summary.html
(substitute the correct rd20NN number etc).
As you will see, you have several subdirectories available: one for each of the examples. You can then look at a whole list of examples and run each of those with the command-line shown - after changing your directory. E.g.
cd 1o22 run_autoSHARP.sh \ -seq 1o22.pir -ha "Se" \ -wvl 0.9778 peak -7 5 -sca 1o22_peak.sca \ -d autoSHARP_SAD-1 | tee autoSHARP_SAD-1.lis
Remember: look at the autoSHARP reference card (PDF) for more help. Or run
run_autoSHARP.sh -h
for online help.
We can also run all of those examples with two extra flags to go for speed:
run_autoSHARP.sh -fast -nowarp ...
On those fast 72-thread processing machines (pxproc01 to pxproc12, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz) we get those results (bold is for the deposited PDB model and italic what you get out of autoSHARP):
PDB | Phasing type | sequence | chains/ASU | #residues | #built | #chains | #sequenced | time |
1o22 | Se-SAD | 169 | 1 | 149 | 153 | 1 | 153 | 5 minutes 47 seconds |
4j8p | Se-SAD | 158 | 1 | 156 | 156 | 1 | 155 | 6 minutes 19 seconds |
4hpe | Se-SAD | 307 | 6 | 1735 | 1827 | 13 | 1765 | 33 minutes 31 seconds |
3isy | Se-MAD, 2 wvls | 119 | 1 | 117 | 118 | 1 | 118 | 7 minutes 49 seconds |
4jm1 | Se-MAD, 2 wvls | 83 | 1 | 84 | 77 | 2 | 73 | 5 minutes 39 seconds |
4is3 | Se-MAD, 3 wvls | 267 | 4 | 997 | 1009 | 5 | 1000 | 30 minutes 17 seconds |
4me8 | Se-MAD, 3 wvls | 150 | 1 | 117 | 105 | 4 | 80 | 8 minutes 19 seconds |
3get | MR-SAD (Se) | 364 | 2 | 726 | ||||
1gxt | SIRAS (Hg) | 90 | 1 | 88 | 88 | 3 | 85 | 10 minutes 19 seconds |
3zft | MIRAS (Hg, Ir) | 147 | 1 | 148 | 142 | 5 | 123 | 7 minutes 33 seconds |
The MR-SAD example didn't work here, and the 4ME8 data also looks like it coud have done better. But as you can see, a lot of those jobs worked fine in a very short time: ideal for a tutorial and if you want to try multiple examples.
It might be interesting to use one of the autoPROC examples during the tutorials: to see the combination of data processing and experimental phasing together. For this you could run one of the following commands:
run_autoSHARP.sh \ -fast -nowarp \ -seq 1o22/1o22.pir -ha "Se" \ -wvl 0.9778 peak -7 5 -sca 1o22_process.01/staraniso_alldata.sca \ -d 1o22_autoSHARP.01 | tee 1o22_autoSHARP.01.lis
run_autoSHARP.sh \ -nowarp \ -seq 4hpe/4HPE.pir -ha "Se" \ -wvl 0.9794 peak -8 5.6 -sca 4hpe_process.01/staraniso_alldata.sca \ -d 4hpe_autoSHARP.01 | tee 4hpe_autoSHARP.01.lis
run_autoSHARP.sh \ -fast -nowarp \ -seq 4j8p/4J8P.pir -ha "Se" \ -wvl 0.97858 peak -8 6 -sca 4j8p_process.01/staraniso_alldata.sca \ -d 4j8p_autoSHARP.01 | tee 4j8p_autoSHARP.01.lis
run_autoSHARP.sh \ -fast -nowarp \ -seq 7jiw/7JIW.seq -ha "Zn" -nsit 2 \ -wvl 0.9778 hrem -sca 7jiw_process.01/staraniso_alldata.sca \ -d 7jiw_autoSHARP.01 | tee 7jiw_autoSHARP.01.lis
We could do something in relation to refinement, restraint dictionaries, ligand fitting, screening campaigns etc if needed. In the meantime, check out the BUSTER wiki for details and examples.