[sharp-discuss] Sharp on a cluster

Clemens Vonrhein vonrhein@globalphasing.com
Fri, 27 Feb 2004 10:12:34 +0000


Dear Evan,

some initial remakrs (before getting into details about your
problems):

  1. any kind of queuing system should be supported if it can be
     adapted from the examples we provide (for DQS and LSF). The way
     to do this is:

    % cd /where/ever/sharp
    % cp -r submit submit.local
    % rm sushi/submit
    % ln -s ../submit.local sushi/submit

    % cd submit.local
    % cp -r dqs MySubmit

    % vi start.dat       # and restart.dat, resume.dat
      ==> see example line for 'dqs' and substitute this with
      'MySubmit'

    You then need to adopt the files in the new subdirectory MySubmit.

    (see also detailed description in the installation manual).

  2. ideally, your cluster should have identical machines (in terms of
     software): so the same set of packages installed, the same CCP4
     (and ARP/wARP etc) installation, same paths and mount-points etc.

  3. when configuring each MASTER (i.e. each machine that should be
     running a SHARP/autoSHARP job) it is then recommended to 'clone'
     the configuration of the master node (_IF_ everything is
     identical, that is!)

> Some jobs that are submitted via rsh crash before the first round of
> sharp.  They crash at the "collecting and analyzing all data" stage with
> the complaint:
> 
> unable to get resolution limits for file
> /home/software/packages/sharp/users/ehbursey/None.sharp/datafiles/16-if3-c.data.mtz

Either CCP4 isn't properly installed (or working) on that particular
machine. Or maybe 'awk', 'grep' or similar UNIX tools aren't properly
installed. Is there a file

  ...ehbursey/None.sharp/datafiles/16-if3-c.data.mtzlog

which looks like a normal 'MTZDUMP' output? Does it contain any error
messages? Did you have a look into the $BDG_home/sushi/logs/error_log
file?

> Still other jobs crash at the "collecting and analyzing all data" stage
> with a different complaint (from CAD2_w2.log):
> 
> ***  Error
>  From LWASSN : Duplicate column labels in output file, columns   8 and 
> 12 both have the label FMIDw2
>  CAD:  *** Program Terminated

This could be related to earlier problems ...

> Other jobs complain about an invalid licence.  The nodes that make this
> complaint make it at the "Finding sites" stage.  The error message is
> in  PKMAPS_w1_ano_set1.log.  I'm assuming that the licence really is
> invalid for these four nodes, although I wonder why the job makes it
> this far without complaining.   

This is the first time the licence key is checked. You can check the
validity of your .licence file by doing:

  % cd /where/ever/sharp
  % source ./setup.csh
  % bin/linux_exe/sharp

ON EACH node! This should only complain about a missing parameter file
(and _not_ about the licence key).

> Running checkBDG.sh reports that none of the nodes have a valid licence,
> although I did receive one and it is in $BDG_home/.licence.  Only a
> couple of nodes complain about licencing, but I wonder, if these flakey
> rsh problems I'm seeing might be related to the .licence file?  Would
> the jobs even start without a valid licence, or would they stop partway
> through the process, as I'm seeing here?  Would the job always give a
> licence-related error or is it possible that the error would appear at a
> later stage.

jobs will always start, even if you haven't got a valid licence. If
you get a message about 'invalid licence' it means you haven't
requested all licence keys: see 

  http://www.globalphasing.com/sharp/

    and

  http://www.globalphasing.com/sharp/restricted/request.html

on how to request additional licence keys.


I think the first thing to do is to make sure all machines have the
same set of packages installed and all of these are at the same
version. Then make sure that crystallographic software (CCP4,
ARP/wARP) is visible in the same way on all nodes.

Cheers

Clemens


-- 

***************************************************************
* Clemens Vonrhein, Ph.D.     vonrhein AT GlobalPhasing DOT com
*
*  Global Phasing Ltd.
*  Sheraton House, Castle Park 
*  Cambridge CB3 0AX, UK
*--------------------------------------------------------------
* BUSTER Development Group      (http://www.globalphasing.com)
***************************************************************