Issues with Mogul from Cambridge Structural Database System 2014

  • There are problems using mogul from initial 2014 release of Cambridge Structural Database System if CSDS is used from a NFS-mounted file system. These problems will affect running standalone mogul as well as its use by grade, grade_PDB_ligand and buster-report.

Contents


Summary of issues and recommendations

  • mogul from the CSDS 2014 initial release behaves differently from previous versions, particularly:
    • If CSDS is used from a filesystem that is mounted using NFSv3 there can be problems where mogul will stall, see details. We would strongly recommend updating to the latest BUSTER or ALL release (dated 20140109 or after) before running grade, grade_PDB_ligand and buster-report using mogul from CSD2014 provided on a NFSv3-mounted file system so any such stall will be detected and result in a clean termination.
    • If CSDS is used from a filesystem that is mounted using NFSv4 we have found that mogul is fully functional but can run rather slowly (sometimes taking more than 10 minutes cpu for jobs that previously took 3 minutes from a NFS-mounted CSDS, details ).
    • If you are sensitive to cpu timings we would recommend using:
      • either a wrapper to use local disk for the CSDS csd and data directories see SoftwareMogulWrapperLocal for detailed instructions.
      • or install CSDS on each machines local disk
    • Note that 2014 mogul is slower than previous versions (but has new features such as producing analysis for fused-rings). Stop press! There is now an update that has much better speed, to download http://www.ccdc.cam.ac.uk/products/csd_system/updates

Identification of the problem release

  • The problem release is Mogul 1.6 (RC5) and possibly Mogul 1.6.1 (DEV7)
  • Recent versions buster-report and grade will directly report the version of mogul used.
  • Or run mogul interactively and select "Help", "About Mogul". This should fire up an information window like:
    • about_mogul_problem.png

Freeze problem with mogul used from NFSv3-mounted CSDS2014

  • NFS version 3 is still commonly used on older systems (see http://en.wikipedia.org/wiki/Network_File_System) but it is increasingly being replaced by NFS version 4 which offers significant benefits.
  • To identify the version of NFS a disk is mounted with simply issue the mount command. For instance:
mount
fs:/mnt/public on /mnt/public type nfs4 (rw,bg,hard,intr,clientaddr=192.168.131.132,addr=192.168.131.146)
fs:/home/osmart on /home/osmart type nfs (rw,addr=192.168.131.146)
    • here /mnt/public is reported as mounted type nfs4 this means NFSv4 is used.
    • but /home/osmart is mounted as type nfs and this means NFSv3 is used.
  • This previous releases of CSDS we have found no problems using a single installation on a NFS filesystem mounted using either NFSv3 or NFSv4.
  • But with CSDS 2014 we found grade and buster-report jobs would occasionally stall when running mogul using cpu but never completing. When this happened messages would appear on the system console (or using dmesg:
lockd: server 192.168.1.146 not responding, timed out
    • here 192.168.1.146 is the IP address of the fileserver providing CSDS.
    • Running mogul interactively when the stall state happened eventually produced error messages:
WARNING: SQLite connection is currently in autocommit mode - no user transaction to rollback...
WARNING: prepare_statement failed (code 3850) - disk I/O error (SQLITE_IOERR_LOCK) - /public/xtal/CCDC/Linux/CSD_System_2014/data/indexdb/indexDb.sqlite - /public/xtal/CCDC/Linux/CSD_System_2014/data/indexdb/indexDb.sqlite
    • eventually (hours) the stall state would clear and mogul would then run fine.
    • Switching to mounting the relevant volume using NFSv4 has solved the stall problem (but there are still performance issues).
  • The current snapshot release (dated 20140109 or after) includes alterations to grade, grade_PDB_ligand and buster-report so as to quickly detect any mogul stall problem and cleanly exit with a clear error message, for instance:
grade -checkdeps
edited out loads of lines
 Setting tool 'mogul' to '/public/xtal/ccdc2013/bin/mogul' from environment variable $BDG_TOOL_MOGUL
	test mogul by asking for bond angle of CO2 ..
ERROR mogul was killed because it hit the CPU limit of 60 seconds.
ERROR this should not happen for this simple test!
ERROR We have found this problem if CSDS 2014 is used from a filesystem mounted
ERROR using NFSv3 because of locking problems (lockd errors reported by dmesg).
ERROR Please contact buster-develop@globalphasing.com if you need advice.
  • Recommendation: update Global Phasing Software BUSTER or ALL to the latest snapshot before use with mogul from CSDS2014 provided on NFSv3 mounted file system. Furthermore make sure that you keep your CSDS up to date by visiting http://www.ccdc.cam.ac.uk/products/csd_system/updates
  • If you get the ERROR report please:
    • visit GradeErrorMessageMogulKilledCPULimit for instructions on how to report.
    • to continue working:
      • either a wrapper to use local disk for the CSDS csd and data directories see SoftwareMogulWrapperLocal for detailed instructions.
      • or install CSDS on a local disk on machines you run grade or buster-report on
      • or contact CCDC support@ccdc.cam.ac.uk who can issue licence keys/database to enable you to continue to use 2013 CSDS.

Performance issue with mogul used from NFSv4-mounted CSDS2014

  • Tests of CSDS2014 from NFSv4-mounted file using grade, grade_PDB_ligand and buster-report showed that mogul was fully functional.
  • However, performance issues arise with arise slow runs from some ligands (including drug-like molecules). Stop press! There is now an update that has much better speed, to download http://www.ccdc.cam.ac.uk/products/csd_system/updates
  • For full details see SoftwareMogulRelease2014NFSissuesTest
  • Given these test results we recommend using mogul from a local filesystem if this is possible.

For further help on this issue


Page by Oliver Smart and Andrew Sharff. original version 7th January 2014, Updated 1st May 2014. Address problems, corrections and clarifications to buster-develop@globalphasing.com