You can always check if the software is correctly set up via
which process process -h
This is after having run the
module load ccp4-workshop clusterme
commands (on the NX/NoMachine server) and then
module load ccp4-workshop
again on the compute/cluster node.
Note: if in your terminal you see a prompt like
[FEDID@cs04r-sc-com99-07 ~]$ ^^^
you are already on the compute node reserved for you. If on the other hand you see something like
[FEDID@cs05r-sc-serv-04 ~]$ ^^^^
then you are still on the main Nomachine server.
Here are some example dataset - all from recently collected and deposited PDB, where raw diffraction images are also available (PD = proteindiffraction.org). The images are already placed on the DLS computers: see full path to those images below.
PDB | PDBpeep | Table-1 | data on DLS computers | archived data | Notes |
6ORC | PDBpeep | Table-1 | /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6ORC/Images | PD | very quick to run, Se-MET for phasing |
6YNQ | PDBpeep | Table-1 | /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6YNQ/Images | PD | very quick to run, small ligand |
7KRX | PDBpeep | Table-1 | /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/7KRX/Images | PD | fast to run - and contains anomalous signal, interesting ligand |
7K1L | PDBpeep | Table-1 | /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/7K1L/Images | PD | twinning, anisotropy, interesting ligand |
6VWW | PDBpeep | Table-1 | /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6VWW/Images | PD | twinning |
6W9C | PDBpeep | Table-1 | /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6W9C/Images | PD | two sweeps, low completeness, anisotropy, multiple lattices?, radiation damage?, might contain some anomalous signal (Zn) |
Remember that we don't provide a graphical interface to start an autoPROC run (there is a lot of graphical output though). You won't need a lot of experience with the terminal/shell and command-line, but a little bit is necessary after all. See also here for some hopefully helpful pointers.
You can run interactively using e.g.
process -I /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6ORC/Images -d 6ORC.01 | tee 6ORC.01.lis
Let's have a closer look at that command (you can skip that if you are already a Linux guru):
The full report of processing (including all analysis about indexing, processing, spacegroup decisions, scaling and final set of fully processed data ready for deposition), can be found in the summary.html file within the output directory. So browse to your current directory (in a file browser) and search for the 6ORC.01/summary.html file. This is also reported on standard output (and you might be able to just click on it to open it). You could also cd into the relevant directory and just start firefox summary.html.
process -h
Although the summary.html file is the first stop to see processing progress (and final results), there are additional resources for you:
grep gpx.sh 6ORC.01.lis
you will see something like
running 6ORC.01/status/01_run_idxref_01/gpx.sh running 6ORC.01/status/03_index/gpx.sh running 6ORC.01/status/04_integ/gpx.sh running 6ORC.01/status/05_postref/gpx.sh running 6ORC.01/status/06_process/gpx.sh
that provides you with tools to visualise predictions at different stages. Usually only those related to indexing (idxref) and the final one are of interest. Especially if you suspect multiple lattices (or multiple indexing solutions): this allows to to see the predictions for each of the (significant) indexing solutions.
If you are "only" interested in the final data quality, have a look at the following files (within the result directory and also referenced from summary.html):
After having done data-proecssing with all options at their default value, a careful analysis of the summary.html reporting might already give you some ideas about possible changes. One could set an explicit space group (SG) via
process symm="P212121" ...
(... symbolises any additional arguments as discussed above). Or a SG/cell combo:
process symm="P212121" cell="43 67 112 90 90 90" ...
There are also a variety of so-called macros: these are predefined collections of parameter settings for typical tasks. One of the most commonly used macro is
process -M LowResOrTricky ...
You might want to have a closer look at the manual or the reference card (PDF) for other suggestions: autoPROC has a very large number of potential parameter settings that can be used to modify its behaviour. The wiki contains a large set of worked-through examples too.
There shouldn't be anything different to those notes above when it comes to processing your own data. One note though: in our experience during those workshops there are a fair amount of "skeletons" that see the light of day, i.e. problematic datasets that have been sitting on disk for a long time in students home institutions and caused all kind of problems. These might come from unusual beamlines/instruments or non-standard settings. So be prepared to provide as much background information about the actual instrument and data collection (back in the days) as possible: the automatic detection of accurate beamline/instrument parameters might not recognize some of those "interesting" datasets right away.
If you are processing so-called mini-cbf files (file ending *.cbf) that are already compressed (i.e. *.cbf.gz), you might need to tell XDS to use a so-called plugin to avoid unnecessarily large numbers of file access and conversion steps. For that add
autoPROC_XdsKeyword_LIB="/dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/xds-zcbf/build/xds-zcbf.so"
to your command (somewhere after the initial process ... and remember the spaces to other command-line flags/arguments!)
It can also be useful to submit any processing job to the DLS computer clusters. For that you will need to write a little shell script (e.g. called run.sh) that could contain something like
#!/bin/sh module load ccp4-workshop process \ -I /dls/i04-1/data/2021/mx29507-1/processing/ClemensVonrhein/Tutorials/6ORC/Images \ -d 6ORC.01 > 6ORC.01.lis 2>&1
What does that file contain?
We can then submit that job via
chmod +x run.sh qsub -pe smp 16 -cwd run.sh
and see it in the cluster queues via
qstat
Note: we have seen that sometimes a submitted job seems to hang for quite some time, especially when handling compressed (*.gz or *.bz2) files - but your mileage might vary
To make your life easier later, you can run the below commands as-is on the Diamond/DLS computers (after having connected via Nomachine): this needs to be done only once!.
echo "alias x ='module load ccp4-workshop'" >> ~/.bashrc_user echo "alias c ='x; clusterme'" >> ~/.bashrc_user mkdir ~/.ssh chmod 0700 ~/.ssh ssh-keygen -t rsa -b 4096 -f ~/.ssh/rsa_clusterme
(hitting just Enter/Return for an empty password). Then
cat ~/.ssh/rsa_clusterme.pub >> ~/.ssh/authorized_keys
to add it to your authorized SSH keys.
Whenever connecting to a fresh Nomachine session (only once in each session):
ssh-add ~/.ssh/rsa_clusterme
After that the command
c
should work fully automatically in any terminal (or terminal tab) you open: it should connect to your compute/cluster node without asking for a password. Afterwards, the setup of the CeBEM-CCP4 workshop environment (on the cluster/compute node) can be done with just typing
x