Content:


Check

You can always check if the software is correctly set up via

  which process
  process -h
 

This is after having run the

  module load ccp4-workshop
  clusterme

commands (on the NX/NoMachine server) and then

  module load ccp4-workshop

again on the compute/cluster node.

Note: if in your terminal you see a prompt like

[FEDID@cs04r-sc-com99-07 ~]$ 
                ^^^

you are already on the compute node reserved for you. If on the other hand you see something like

[FEDID@cs05r-sc-serv-04 ~]$ 
                ^^^^

then you are still on the main Nomachine server.


Tutorial/example data

Here are some example dataset - all from recently collected and deposited PDB, where raw diffraction images are also available (PD = proteindiffraction.org). The images are already placed on the DLS computers: see full path to those images below.

PDB PDBpeep Table-1 data on DLS computers archived data Notes
6ORC PDBpeep Table-1 /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6ORC/Images PD very quick to run, Se-MET for phasing
6YNQ PDBpeep Table-1 /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6YNQ/Images PD very quick to run, small ligand
7KRX PDBpeep Table-1 /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/7KRX/Images PD fast to run - and contains anomalous signal, interesting ligand
7K1L PDBpeep Table-1 /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/7K1L/Images PD twinning, anisotropy, interesting ligand
6VWW PDBpeep Table-1 /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6VWW/Images PD twinning
6W9C PDBpeep Table-1 /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6W9C/Images PD two sweeps, low completeness, anisotropy, multiple lattices?, radiation damage?, might contain some anomalous signal (Zn)

Caveat

Remember that we don't provide a graphical interface to start an autoPROC run (there is a lot of graphical output though). You won't need a lot of experience with the terminal/shell and command-line, but a little bit is necessary after all. See also here for some hopefully helpful pointers.


Simple run

You can run interactively using e.g.

  process -I /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6ORC/Images -d 6ORC.01 | tee 6ORC.01.lis

Detailed explanation of command

Let's have a closer look at that command (you can skip that if you are already a Linux guru):

Main reporting and analysis

The full report of processing (including all analysis about indexing, processing, spacegroup decisions, scaling and final set of fully processed data ready for deposition), can be found in the summary.html file within the output directory. So browse to your current directory (in a file browser) and search for the 6ORC.01/summary.html file. This is also reported on standard output (and you might be able to just click on it to open it). You could also cd into the relevant directory and just start firefox summary.html.

Sources of information regarding autoPROC and XDS

              process -h

Additional tools and files for inspection

Although the summary.html file is the first stop to see processing progress (and final results), there are additional resources for you:

  grep gpx.sh 6ORC.01.lis

you will see something like

running 6ORC.01/status/01_run_idxref_01/gpx.sh
running 6ORC.01/status/03_index/gpx.sh
running 6ORC.01/status/04_integ/gpx.sh
running 6ORC.01/status/05_postref/gpx.sh
running 6ORC.01/status/06_process/gpx.sh

that provides you with tools to visualise predictions at different stages. Usually only those related to indexing (idxref) and the final one are of interest. Especially if you suspect multiple lattices (or multiple indexing solutions): this allows to to see the predictions for each of the (significant) indexing solutions.

If you are "only" interested in the final data quality, have a look at the following files (within the result directory and also referenced from summary.html):


Some ideas about "advanced" processing

After having done data-proecssing with all options at their default value, a careful analysis of the summary.html reporting might already give you some ideas about possible changes. One could set an explicit space group (SG) via

  process symm="P212121" ...

(... symbolises any additional arguments as discussed above). Or a SG/cell combo:

  process symm="P212121" cell="43 67 112 90 90 90" ...

There are also a variety of so-called macros: these are predefined collections of parameter settings for typical tasks. One of the most commonly used macro is

  process -M LowResOrTricky ...

You might want to have a closer look at the manual or the reference card (PDF) for other suggestions: autoPROC has a very large number of potential parameter settings that can be used to modify its behaviour. The wiki contains a large set of worked-through examples too.


Processing your own data

There shouldn't be anything different to those notes above when it comes to processing your own data. One note though: in our experience during those workshops there are a fair amount of "skeletons" that see the light of day, i.e. problematic datasets that have been sitting on disk for a long time in students home institutions and caused all kind of problems. These might come from unusual beamlines/instruments or non-standard settings. So be prepared to provide as much background information about the actual instrument and data collection (back in the days) as possible: the automatic detection of accurate beamline/instrument parameters might not recognize some of those "interesting" datasets right away.

If you are processing so-called mini-cbf files (file ending *.cbf) that are already compressed (i.e. *.cbf.gz), you might need to tell XDS to use a so-called plugin to avoid unnecessarily large numbers of file access and conversion steps. For that add

  autoPROC_XdsKeyword_LIB="/dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/xds-zcbf/build/xds-zcbf.so"

to your command (somewhere after the initial process ... and remember the spaces to other command-line flags/arguments!)

Submitting jobs to the cluster queues

It can also be useful to submit any processing job to the DLS computer clusters. For that you will need to write a little shell script (e.g. called run.sh) that could contain something like

#!/bin/sh

module load ccp4-workshop

process \
    -I /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6ORC/Images \
    -d 6ORC.01 > 6ORC.01.lis 2>&1

What does that file contain?

We can then submit that job via

  chmod +x run.sh
  qsub -pe smp 16 -cwd run.sh

and see it in the cluster queues via

  qstat
 

Note: we have seen that sometimes a submitted job seems to hang for quite some time, especially when handling compressed (*.gz or *.bz2) files - but your mileage might vary

Setup - optional (work-in-progress)

To make your life easier later, you can run the below commands as-is on the Diamond/DLS computers (after having connected via Nomachine): this needs to be done only once!.

  echo "alias x ='module load ccp4-workshop'" >> ~/.bashrc_user
  echo "alias c ='x; clusterme'" >> ~/.bashrc_user
  mkdir ~/.ssh
  chmod 0700 ~/.ssh
  ssh-keygen -t rsa -b 4096 -f  ~/.ssh/rsa_clusterme

(hitting just Enter/Return for an empty password). Then

  cat ~/.ssh/rsa_clusterme.pub >> ~/.ssh/authorized_keys

to add it to your authorized SSH keys.

Whenever connecting to a fresh Nomachine session (only once in each session):

  ssh-add ~/.ssh/rsa_clusterme

After that the command

  c

should work fully automatically in any terminal (or terminal tab) you open: it should connect to your compute/cluster node without asking for a password. Afterwards, the setup of the CeBEM-CCP4 workshop environment (on the cluster/compute node) can be done with just typing

  x