AutoProcTutorial4U4H

Content:

Introduction
Getting all files
Running autoPROC data processing
Running autoSHARP for structure solution and model building

Introduction

4U4H is a small protein structure determined in the Heldwein lab using a mercury soak for SIRAS. It was originally processed with HKL, solved using SHARP and refined with Phenix to a resolution of 2.05A.

The commands shown below should be run in a terminal, within an empty directory. It can be useful to have two terminals (or terminal tabs) open at the same time - with one being used to run the programs and the other to look at intermediate results (while jobs are still running).

Getting all files

First we need to download the images for the native and derivative dataset from here using

   rsync -av rsync://data.sbgrid.org/10.15785/SBGRID/140 .
   rsync -av rsync://data.sbgrid.org/10.15785/SBGRID/141 .

Finally, we should fetch the sequence for this protein using e.g.

  wget -O 4u4h.seq "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=fastachain&compression=NO&structureId=4U4H&chainId=A"

  curl -o 4u4h.seq "http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=fastachain&compression=NO&structureId=4U4H&chainId=A"

(depending which command is available).

Running autoPROC data processing

One could run it very simplistic using

  process -I 140 -d nat-autoPROC.01 | tee nat-autoPROC.01.lis
  process -I 141 -d Hg-autoPROC.01 | tee Hg-autoPROC.01.lis

A summary of processing steps can be seen in files nat-autoPROC.01/summary.html and Hg-autoPROC.01/summary.html - which can be loaded into your browser e.g. via

  firefox `pwd`/nat-autoPROC.01/summary.html
  firefox `pwd`/Hg-autoPROC.01/summary.html

(remember to reload from time to time to see new content appended to the file while autoPROC is still running). For more help in interpreting the output see also the relevant autoPROC manual.

A better processing option would be to

mask the beamstop and beamstop holder explicitly
only treat anomalous data special for the Hg derivative
restrict to a more sensible (but still optimistic) resolution range

which would give the following commands:

  process -I 140 \
          -noANO \
          autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1302 1411 1254 1366" \
          autoPROC_XdsKeyword_UNTRUSTED_QUADRILATERAL="0 1280 1338 1299 1346 1326 0 1316" \
          -R 100.0 1.6 \
          -d nat-autoPROC.02 | tee nat-autoPROC.02.lis
 
  process -I 141 \
          autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1202 1355 1227 1385" \
          autoPROC_XdsKeyword_UNTRUSTED_QUADRILATERAL="0 1276 1245 1287 1255 1329 0 1327" \
          -R 100.0 2.0 \
          -d Hg-autoPROC.02 | tee Hg-autoPROC.02.lis

If you prefer the high-resolution limit to be determined only by the CC(1/2) statistic: adding "-M HighResCutOnCChalf" to the command-line would achieve this with a high-resolution criteria of CC(1/2)>=30%:

  process -I 140 \
          -noANO \
          autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1302 1411 1254 1366" \
          autoPROC_XdsKeyword_UNTRUSTED_QUADRILATERAL="0 1280 1338 1299 1346 1326 0 1316" \
          -R 100.0 1.6 \
          -M HighResCutOnCChalf \
          -d nat-autoPROC.02 | tee nat-autoPROC.02.lis
 
  process -I 141 \
          autoPROC_XdsKeyword_UNTRUSTED_ELLIPSE="1202 1355 1227 1385" \
          autoPROC_XdsKeyword_UNTRUSTED_QUADRILATERAL="0 1276 1245 1287 1255 1329 0 1327" \
          -R 100.0 2.0 \
          -M HighResCutOnCChalf \
          -d Hg-autoPROC.02 | tee Hg-autoPROC.02.lis

Running autoSHARP for structure solution and model building

The results from autoPROC can be used directly in autoSHARP via

  ln -s nat-autoPROC.02/aimless.sca nat.sca
  ln -s Hg-autoPROC.02/aimless.sca Hg.sca
  run_autoSHARP.sh -seq 4u4h.seq \
    -nat -sca nat.sca \
    -ha Hg -nsit 3 -wvl 1.0 -sca Hg.sca \
    -id autoSHARP.01 | tee autoSHARP.01.lis

The full output of this autoSHARP run can be seen by loading autoSHARP.01/LISTautoSHARP.html into your browser, e.g. via

  firefox `pwd`/autoSHARP.01/LISTautoSHARP.html

In the end you should have a nearly complete model of about 200 built and sequenced residues.