[buster-discuss] PdbChk_RecordsStartingWithSpace

Fri Feb 22 11:05:17 CET 2019

Hi Ed,

On Thu, Feb 21, 2019 at 08:41:49PM -0500, Edwin Pozharski wrote:
> Is there a rational reason for this check?

Yes: according to

  http://www.wwpdb.org/documentation/file-format-content/format33/sect1.html#Record

a PDB file would have

  The first six columns of every line contains a record name, that is
  left-justified and separated by a blank. The record name must be an
  exact match to one of the stated record names in this format guide.

Since 'pdbchk' checks for conformance with the PDB format, lines
starting with spaces are flagged as invalid.

There are two reasons for us running such a detailed check on the
input PDB file: (1) to ensure we understand the user input correctly
and (2) so that programs further down the line can be sure to have
standard-conforming PDB file.

> I have a file that is output by phenix and it has some lines in the
> header that start with spaces.

Maybe you can also check with the Phenix developers and ask why they
are writing non-conforming PDB files? Maybe that is an oversight on
their side and it could trigger fixes in Phenix that everyone else
might benefit from too. A fix at the source would be much more
satisfying than providing additional patch-up functionality further
downstream.

> Can they be just ignored if they occur (as these do) prior to the
> first ATOM line?  Most of my downtime when running buster jobs is
> usually fixing minor issues with the input model format, and this
> one just adds to the list and seems unnecessary.

These must be lines with additional text (since we do check for empty
lines before this check). So it could be a case of

  REMARK some additional info:
   run on Thursday
  CRYST1 ...
  ATOM ...

(non standard PDB file written by program) or

  REMARK some additional info:
   CRYST1 ...
  ATOM ...

(non-standard PDB file with error introduced e.g. by user editing
mistake).

What other "minor issues" are you dealing with? Maybe a list would be
useful to other users as well.

The best solution seems to ensure that programs (e.g. Phenix in your
case) write standard-conforming PDB files - but if you can tell us
what exact type of records you have problems with, we might be able to
add a fix-up function to the next release.

Since you know that these records can be ignored, you can always do

  egrep -v "^[ ]*$|^ " original.pdb > fixed.pdb
  refine -p fixed.pdb ...

Cheers

Clemens, Claus & Gerard