TutorialRapiData2022

Content:

Introduction
Example 1 - 1O22
Example 2 - Lysozyme

Introduction

The pages for RapiData2021 can still be used. On top of that we'll describe running autoSHARP on data processed as part of this workshop here.

Please remember to run

source ~rapidata2/setup.csh

whenever you open a new terminal/shell - in order to have a correct setup for our software and these tutorials.

Example 1 - 1O22

A quick example of processing a datset with autoPROC/XDS+STARANISO would be to run

run_autoSHARP.sh \
    -seq /data/rapidata2/SampleData2022/1o22/1o22.pir \
    -ha "Se" \
    -wvl 0.9778 peak -7 5 \
    -sca /data/rapidata2/SampleData2022/1o22/1o22_peak.sca \
    -d 1o22-aS.01 | tee 1o22-aS.01.lis

Note that the above notation (separate lines with a backslash at the end) is only done for better readability: that "backslash" symbolises a continuation - so the above is actually just a single, rather long command-line. You could also run this as

run_autoSHARP.sh -seq /data/rapidata2/SampleData2022/1o22/1o22.pir -ha "Se" -wvl 0.9778 peak -7 5 -sca /data/rapidata2/SampleData2022/1o22/1o22_peak.sca -d 1o22-aS.01 | tee 1o22-aS.01.lis

Have a look at the output of

run_autoSHARP.sh -h

to understand the various options and syntax. See also the autoSHARP reference card for additional, basic information (the full manual is here and the software itself can be found here).

We are using some kind of meaningful naming convention for output files/directories ("1o22-aS" to denote the project and that this is running autoSHARP) with a numerical extension (".01"). This way we can run the same (or very similar) commands multiple times, incrementing that numerical extension as we go along.

If you followed the autoPROC tutorial for 1o22, you could also use the corresponding reflection data from that (e.g. 1o22.01/staraniso_alldata.sca).

Example 2 - Lysozyme

As part of the autoPROC tutorials you probably processed the Co-containing lysozyme data. If you did, you can run autoSHARP via the following command:

    run_autoSHARP.sh \
      -seq /data/rapidata2/SampleData2022/CoLyso_SAD/lyso.seq \
      -ha Co -nsit 1 \
      -wvl 0.97946 peak \
      -sca CoLyso_SAD.01/staraniso_alldata.sca \
      -nowarp \
      -R 100 1.5 \
      -d CoLyso_SAD.01-aS.02 | tee CoLyso_SAD.01-aS.02.lis

Let's look at the various command-line arguments:

-seq /data/rapidata2/SampleData2022/CoLyso_SAD/lyso.seq points to a text file with the monomer sequence (one-letter code)

-ha Co -nsit 1 tells autoSHARP what type of heavy atom and how many to expect

-wvl 0.97946 peak defines the wavelength (in Angstroem) and gives an identifier

the latter is not really relevant for SAD phasing, but becomes important when doing MAD - since SHELXC expects such naming in favour to actual wavelength values.
some typical naming convention would be using "peak", "infl", "lrem" and "hrem" for the various MAD energies

sca CoLyso_SAD.01/staraniso_alldata.sca picks the reflection file - here in SCALEPACK format

one can also use MTZ files
here we use the anisotropically analysed/processed data from STARANISO
the file based on traditional, isotropic analysis would be aimless.sca

-nowarp switches off the use of ARP/wARP

this can be a very powerful model building step, giving well refined models
but it can be slow (and is therefore not ideal for workshop tutorials)

-R 100 1.5 restricts the resolution to 1.5A - no need to use even higher resolution to see if we can solve that structure (and things run slower with more reflection data to consider)

-d CoLyso_SAD.01-aS.02 define the output directory for results

| tee CoLyso_SAD.01-aS.02.lis pipe standard output into the tee command (to show and save that information)