TutorialRapiData2021

Content:

Introduction
autoPROC/STARANISO

Examples

SHARP/autoSHARP

autoSHARP examples

BUSTER

Introduction

To setup the environment and copying some of the example data over, please run

source /data/rapidata2/gphl.csh

whenever you connect to one of the processing machines. If everything works as expected, you should then be placed automatically into a directory like

/data/rd2011/GPhL

which would contain several subdirectories with example data (all named according to the PDB identifier).

If you are interested in some of our work related to Covid-19: see here for data processing with autoPROC and our notes regarding refinement with BUSTER.

autoPROC/STARANISO

There are several example datasets available you can use for running autoPROC:

1o22/Images   => 90 degree, 1.0 deg/image, CCD
3get/Images   => 90 degree, 1.0 deg/image, CCD
3isy/Images   => 2 wavelengths (90 degree, 1.0 deg/image), CCD
4hpe/Images   => 360 degree, 0.5 deg/image, Pilatus
4j8p/Images   => 100 degree, 0.5 deg/image, Pilatus
4jm1/Images   => 3 wavelengths, 0.5 deg/image, Pilatus
7jiw/Images   => 999 images, 0.3 deg/image, Pilatus

Or have a look at the examples here. Of course, the most interesting would be to use one of your own datasets - if you have any available and can transfer them to SSRL computers. Below are some suggestions on how to run full data-processing on those datasets, but also see

  process -h

and

  process -M list

The simplest way is to run

  process -I /where/ever/image/directory -d out.01 | tee out.01.lis

All output will be written into subdirectory out.01 and standard output is saved into out.01.lis (but also written to the terminal - the tee command does this little trick). The most important output file is out.01/summary.html, so you could also run

  process -I /where/ever/image/directory -d out.01 > out.01.lis &
  firefox out.01/summary.html

A few commonly used options are (... denotes rest of the arguments as described above):

  process -M LowResOrTricky ...                       # difficult data

  process -M HighResCutOnCChalf ...                   # isotropic high-resolution limit based
                                                      # on CC1/2 (instead of I/sig(I))

  process -M ScalingX ...                             # use XSCALE instead of AIMLESS scaling

Those arguments (macros invoked via -M flag) can also be combined.

Examples

You should be able to run the following commands for various SAD examples:

1O22 (~5 min)

      process -ANO -M HighResCutOnCChalf -I 1o22/Images -d 1o22_process.01 | tee 1o22_process.01.lis

4HPE (~13 min)

      process -ANO -M HighResCutOnCChalf -I 4hpe/Images -d 4hpe_process.01 | tee 4hpe_process.01.lis

4J8P (~8 min)

      process -ANO -M HighResCutOnCChalf -I 4j8p/Images -d 4j8p_process.01 | tee 4j8p_process.01.lis

7JIW (~17 min)

      process -ANO -M HighResCutOnCChalf -I 7jiw/Images -d 7jiw_process.01 | tee 7jiw_process.01.lis

Instead of waiting for the program to finish, you can open a browser (firefox - the globe icon at the bottom of your desktop) and go to the summary.html file of a particular job, e.g.

/data/rd2099/GPhL/1o22_process.01/summary.html

(substitute the correct rd20NN number etc).

SHARP/autoSHARP

As you will see, you have several subdirectories available: one for each of the examples. You can then look at a whole list of examples and run each of those with the command-line shown - after changing your directory. E.g.

  cd 1o22
  run_autoSHARP.sh \
    -seq 1o22.pir -ha "Se" \
    -wvl 0.9778 peak -7 5 -sca 1o22_peak.sca \
    -d autoSHARP_SAD-1 | tee autoSHARP_SAD-1.lis

Remember: look at the autoSHARP reference card (PDF) for more help. Or run

  run_autoSHARP.sh -h

for online help.

We can also run all of those examples with two extra flags to go for speed:

  run_autoSHARP.sh -fast -nowarp ...

On those fast 72-thread processing machines (pxproc01 to pxproc12, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz) we get those results (bold is for the deposited PDB model and italic what you get out of autoSHARP):

PDB	Phasing type	sequence	chains/ASU	#residues	#built	#chains	#sequenced	time
1o22	Se-SAD	169	1	149	153	1	153	5 minutes 47 seconds
4j8p	Se-SAD	158	1	156	156	1	155	6 minutes 19 seconds
4hpe	Se-SAD	307	6	1735	1827	13	1765	33 minutes 31 seconds
3isy	Se-MAD, 2 wvls	119	1	117	118	1	118	7 minutes 49 seconds
4jm1	Se-MAD, 2 wvls	83	1	84	77	2	73	5 minutes 39 seconds
4is3	Se-MAD, 3 wvls	267	4	997	1009	5	1000	30 minutes 17 seconds
4me8	Se-MAD, 3 wvls	150	1	117	105	4	80	8 minutes 19 seconds
3get	MR-SAD (Se)	364	2	726
1gxt	SIRAS (Hg)	90	1	88	88	3	85	10 minutes 19 seconds
3zft	MIRAS (Hg, Ir)	147	1	148	142	5	123	7 minutes 33 seconds

The MR-SAD example didn't work here, and the 4ME8 data also looks like it coud have done better. But as you can see, a lot of those jobs worked fine in a very short time: ideal for a tutorial and if you want to try multiple examples.

autoSHARP examples

It might be interesting to use one of the autoPROC examples during the tutorials: to see the combination of data processing and experimental phasing together. For this you could run one of the following commands:

1O22 (~7 min)

          run_autoSHARP.sh \
              -fast -nowarp \
              -seq 1o22/1o22.pir -ha "Se" \
              -wvl 0.9778 peak -7 5 -sca 1o22_process.01/staraniso_alldata.sca \
              -d 1o22_autoSHARP.01 | tee 1o22_autoSHARP.01.lis

4HPE (will run for quite some time: ~2h)

weak low-resolution anomalous signal
initially solved by molecular replacement (even though it is a Se-MET protein)

          run_autoSHARP.sh \
              -nowarp \
              -seq 4hpe/4HPE.pir -ha "Se" \
              -wvl 0.9794 peak -8 5.6 -sca 4hpe_process.01/staraniso_alldata.sca \
              -d 4hpe_autoSHARP.01 | tee 4hpe_autoSHARP.01.lis

4J8P (~7 min)

1 Met in 159 residues
originally solved with autoSHARP (SHELXC/D and SHARP)

          run_autoSHARP.sh \
              -fast -nowarp \
              -seq 4j8p/4J8P.pir -ha "Se" \
              -wvl 0.97858 peak -8 6 -sca 4j8p_process.01/staraniso_alldata.sca \
              -d 4j8p_autoSHARP.01 | tee 4j8p_autoSHARP.01.lis

7JIW (~ 14min)

this was initially not solved by Zn-SAD, but rather via molecular replacement
but the Zn signal should be strong enough to be used for experimental phasing
we'll have to guess the number of Zn sites per chain to some extent ...

          run_autoSHARP.sh \
              -fast -nowarp \
              -seq 7jiw/7JIW.seq -ha "Zn" -nsit 2 \
              -wvl 0.9778 hrem  -sca 7jiw_process.01/staraniso_alldata.sca \
              -d 7jiw_autoSHARP.01 | tee 7jiw_autoSHARP.01.lis

BUSTER

We could do something in relation to refinement, restraint dictionaries, ligand fitting, screening campaigns etc if needed. In the meantime, check out the BUSTER wiki for details and examples.