Content:


Introduction

To setup the environment and copying some of the example data over, please run

source /data/rapidata2/gphl.csh

whenever you connect to one of the processing machines. If everything works as expected, you should then be placed automatically into a directory like

/data/rd2011/GPhL

which would contain several subdirectories with example data (all named according to the PDB identifier).

If you are interested in some of our work related to Covid-19: see here for data processing with autoPROC and our notes regarding refinement with BUSTER.

autoPROC/STARANISO

There are several example datasets available you can use for running autoPROC:

1o22/Images   => 90 degree, 1.0 deg/image, CCD
3get/Images   => 90 degree, 1.0 deg/image, CCD
3isy/Images   => 2 wavelengths (90 degree, 1.0 deg/image), CCD
4hpe/Images   => 360 degree, 0.5 deg/image, Pilatus
4j8p/Images   => 100 degree, 0.5 deg/image, Pilatus
4jm1/Images   => 3 wavelengths, 0.5 deg/image, Pilatus
7jiw/Images   => 999 images, 0.3 deg/image, Pilatus

Or have a look at the examples here. Of course, the most interesting would be to use one of your own datasets - if you have any available and can transfer them to SSRL computers. Below are some suggestions on how to run full data-processing on those datasets, but also see

  process -h

and

  process -M list

The simplest way is to run

  process -I /where/ever/image/directory -d out.01 | tee out.01.lis

All output will be written into subdirectory out.01 and standard output is saved into out.01.lis (but also written to the terminal - the tee command does this little trick). The most important output file is out.01/summary.html, so you could also run

  process -I /where/ever/image/directory -d out.01 > out.01.lis &
  firefox out.01/summary.html

A few commonly used options are (... denotes rest of the arguments as described above):

  process -M LowResOrTricky ...                       # difficult data

  process -M HighResCutOnCChalf ...                   # isotropic high-resolution limit based
                                                      # on CC1/2 (instead of I/sig(I))

  process -M ScalingX ...                             # use XSCALE instead of AIMLESS scaling

Those arguments (macros invoked via -M flag) can also be combined.

Examples

You should be able to run the following commands for various SAD examples:

      process -ANO -M HighResCutOnCChalf -I 1o22/Images -d 1o22_process.01 | tee 1o22_process.01.lis
      process -ANO -M HighResCutOnCChalf -I 4hpe/Images -d 4hpe_process.01 | tee 4hpe_process.01.lis
      process -ANO -M HighResCutOnCChalf -I 4j8p/Images -d 4j8p_process.01 | tee 4j8p_process.01.lis
      process -ANO -M HighResCutOnCChalf -I 7jiw/Images -d 7jiw_process.01 | tee 7jiw_process.01.lis

Instead of waiting for the program to finish, you can open a browser (firefox - the globe icon at the bottom of your desktop) and go to the summary.html file of a particular job, e.g.

/data/rd2099/GPhL/1o22_process.01/summary.html

(substitute the correct rd20NN number etc).

SHARP/autoSHARP

As you will see, you have several subdirectories available: one for each of the examples. You can then look at a whole list of examples and run each of those with the command-line shown - after changing your directory. E.g.

  cd 1o22
  run_autoSHARP.sh \
    -seq 1o22.pir -ha "Se" \
    -wvl 0.9778 peak -7 5 -sca 1o22_peak.sca \
    -d autoSHARP_SAD-1 | tee autoSHARP_SAD-1.lis

Remember: look at the autoSHARP reference card (PDF) for more help. Or run

  run_autoSHARP.sh -h

for online help.

We can also run all of those examples with two extra flags to go for speed:

  run_autoSHARP.sh -fast -nowarp ...

On those fast 72-thread processing machines (pxproc01 to pxproc12, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz) we get those results (bold is for the deposited PDB model and italic what you get out of autoSHARP):

PDB Phasing type sequence chains/ASU #residues #built #chains #sequenced time
1o22 Se-SAD 169 1 149 153 1 153 5 minutes 47 seconds
4j8p Se-SAD 158 1 156 156 1 155 6 minutes 19 seconds
4hpe Se-SAD 307 6 1735 1827 13 1765 33 minutes 31 seconds
3isy Se-MAD, 2 wvls 119 1 117 118 1 118 7 minutes 49 seconds
4jm1 Se-MAD, 2 wvls 83 1 84 77 2 73 5 minutes 39 seconds
4is3 Se-MAD, 3 wvls 267 4 997 1009 5 1000 30 minutes 17 seconds
4me8 Se-MAD, 3 wvls 150 1 117 105 4 80 8 minutes 19 seconds
3get MR-SAD (Se) 364 2 726
1gxt SIRAS (Hg) 90 1 88 88 3 85 10 minutes 19 seconds
3zft MIRAS (Hg, Ir) 147 1 148 142 5 123 7 minutes 33 seconds

The MR-SAD example didn't work here, and the 4ME8 data also looks like it coud have done better. But as you can see, a lot of those jobs worked fine in a very short time: ideal for a tutorial and if you want to try multiple examples.

autoSHARP examples

It might be interesting to use one of the autoPROC examples during the tutorials: to see the combination of data processing and experimental phasing together. For this you could run one of the following commands:

          run_autoSHARP.sh \
              -fast -nowarp \
              -seq 1o22/1o22.pir -ha "Se" \
              -wvl 0.9778 peak -7 5 -sca 1o22_process.01/staraniso_alldata.sca \
              -d 1o22_autoSHARP.01 | tee 1o22_autoSHARP.01.lis
  • 4HPE (will run for quite some time: ~2h)
    • weak low-resolution anomalous signal
    • initially solved by molecular replacement (even though it is a Se-MET protein)
          run_autoSHARP.sh \
              -nowarp \
              -seq 4hpe/4HPE.pir -ha "Se" \
              -wvl 0.9794 peak -8 5.6 -sca 4hpe_process.01/staraniso_alldata.sca \
              -d 4hpe_autoSHARP.01 | tee 4hpe_autoSHARP.01.lis
  • 4J8P (~7 min)
    • 1 Met in 159 residues
    • originally solved with autoSHARP (SHELXC/D and SHARP)
          run_autoSHARP.sh \
              -fast -nowarp \
              -seq 4j8p/4J8P.pir -ha "Se" \
              -wvl 0.97858 peak -8 6 -sca 4j8p_process.01/staraniso_alldata.sca \
              -d 4j8p_autoSHARP.01 | tee 4j8p_autoSHARP.01.lis
  • 7JIW (~ 14min)
    • this was initially not solved by Zn-SAD, but rather via molecular replacement
    • but the Zn signal should be strong enough to be used for experimental phasing
    • we'll have to guess the number of Zn sites per chain to some extent ...
          run_autoSHARP.sh \
              -fast -nowarp \
              -seq 7jiw/7JIW.seq -ha "Zn" -nsit 2 \
              -wvl 0.9778 hrem  -sca 7jiw_process.01/staraniso_alldata.sca \
              -d 7jiw_autoSHARP.01 | tee 7jiw_autoSHARP.01.lis

BUSTER

We could do something in relation to refinement, restraint dictionaries, ligand fitting, screening campaigns etc if needed. In the meantime, check out the BUSTER wiki for details and examples.