[buster-discuss] cif file for a chromophore

Mon Feb 8 10:01:19 CET 2021

Hi Clemens, 

thanks for your Answer, 

I reply in The main text with "###" before my answer to keep the discussion easy to read. 

----- Mail original -----
De: "Clemens Vonrhein" <vonrhein at globalphasing.com>
À: "Nicolas Foos" <nicolas.foos at ibs.fr>
Cc: "buster-discuss" <buster-discuss at globalphasing.com>
Envoyé: Samedi 6 Février 2021 12:12:43
Objet: Re: [buster-discuss] cif file for a chromophore

Hi Nicolas,

On top of Guenther's suggestion to check the dictionary source
(SMILES), here are some additional things to consider and/or check.

On Fri, Feb 05, 2021 at 12:24:35PM +0100, Nicolas Foos wrote:
> I am struggling in the refinement process of a fluorescent
> protein. This protein is rsEGFP2. pdb code : 5O8A for example.

Do you have a similar situation to 5O8A, namely XFEL data and multiple
models (with different occupancies of 0.9/0.1) in a single entry?

### No in my case, I don't have this, or at least I don't have any argument to build this way. My situation is much more conventional : One model and few alternates conformations (some residues and the PIA = chromophore). 

> In my cases, the chromophore (aka PIA) has a slightly unexpected
> conformation.

Is it unusual in its linkages to the protein main-chain or only in
the "side-chain" part? The former could have an impact on the way
restraints are understood, set up and/or used.

### It depends on which cif file I am using. If tried several way to generate it and sometimes the linkage are quite OK sometimes not. Sometime the "sidechain" is broken or reversely new connexion between atoms appeared. 

> The problem is I can't refine it properly using buster.

Not good.

### Exact, it's my favorite soft for refinement. I use it for years in all my project (From Structural biology to methodology).

> I generate a cif file using several method (phenix elbow, using ccp4
> and using your servers) no matter the way. Some of these cif files
> can be use by buster,

In principle, any restraint dictionary can be used with BUSTER as long
as it follows the "normal" mmCIF syntax (the problem might be that
when it comes to restraints dictionaries there is no completely
defined "standard" and slight variances could have an impact -
especially for covalently bound compounds). We know that Grade
dictionaries work with BUSTER, REFMAC and Phenix.refine - and that
users have used AceDRG, Elbow, and other mmCIF dictionaries with
BUSTER: so maybe not something directly related to the dictionary
itself but rather how that compound interacts (via covalent linkages)
with the rest of the structure?

### I agree, the compound linked covalently is complicating the situation.

> but I face two problems, depending of the file, the PIA suffers from
> sever distortion, or reversely is not refined at all.

Does that distortion include mainly the linkages to the protein
main-chain or everything?

### It could be both. 

Do you have adequate LINK records in your input PDB file?

### I have to have one, if not the pdb check will remind me. Or the link will be broken if you forced without the appropriate LINK records.

Do you get some messages about PIA (and/or its connections to the
protein main-chain) during the sanity check in BUSTER? This can often
highlight where the issue is (e.g. missing LINK record etc).

### When I have one, I use this message to fix the problem, thanks to you and all the buster developers, messages are in general pretty informative. 

> To be honnest I try other refinement program (phenix refine or
> refmac) and both looks to interpret the cif file in a different
> way. It means that the PIA conformation looks from "fully exploded"
> to not pure theoretical geometry.

All three refinement packages should be able to refine your model
correctly - maybe with some small variations due to different
implementations, but PIA as it is e.g. in 5O8A is not that unusual
when it comes to chemistry. So if you have different type of trouble
with all of them I would first double-check your input PDB file (LINK
records, occupancy pattern PIA already in reasonably sensible position
relative to protein main-chain etc).

### I agree, it's not an exotic compound.

> If somebody could help me with this cif file generation, I will be more than happy. 

Using another PDB entry (6S68) as an example, there are a few things
one has to take care of when it comes to this PIA compound - if (1)
you want to use more accurate restraints across the PIA-protein
linkages and (2) because it has three different covalent linkages to
the protein. So let's walk through this one:

#### I did something a bit different which looks to also works. 

I follow the suggestions from Guenter Fritz and generates a cif file from SMILES. I did it instead of using a pdb to generate the restraints. The fun starts just after
because the cif has a different nomenclature from the one used in pdb for atoms. So after running acedrg, I fixed all the names using vi and removed H and all the parameters linked to the H presence (I am not using H in my current situations). 
The major subtility is to think well before using the search and replace, because you can fall in diverse trap due to the atoms name (you can typically replace the C3 written in the cif file by the C1 corresponding (to accomodate the pdb nomenclature) and then want to replace the C1 from cif and you actually replace this one as expected plus the one you just wrote a second earlier. Finally I fund a way to do that using some "fakes names" as intermediary.
This methods is maybe a bit Shadok's style but seems to works. 
The only weak point maybe is : buster apparently understand everything and do its (great) job, but coot is not really happy with the description of the link between the main chain and the chromophore, I have to check that more carefully to see what could be the problem (it's only visible when I try to use a the tools for real-space refinement). But this point is not anymore on the buster side (which understand and take the link into account). I guess your protocole below will do the trick because it take that point in consideration from the beginning. 

Thank you very much for your answer and for the time you spend trying to reproduce my difficulties. 

All the best, 

Nicolas

 (1) get the PDB model and reflection data, e.g. with

       fetch_PDB 6S68

     to give you

       6S68/6s86.pdb
       6S68/6s86.mtz

 (2) generate PIA restraint dictionary via

       grade_PDB_ligand PIA

     or (if you don't have CSD installed) using the Grade webserver
     [1].

 (3) generate covalent linkage descriptions for the three PIA-protein
     bonds by running

       aB_covalent_ligand 6S68/6s68.pdb
       aB_covalent_ligand 6S68/6s68.pdb
       aB_covalent_ligand 6S68/6s68.pdb

     three times (selecting LINK record 1, 2 and 3). Again, if you
     don't have the CSD software installed locally you can use the
     Grade server by adding a "-server" flag to the above
     command. Also, if your PDB file doesn't yet contain the correct
     LINK records you need to add those first (e.g. within Coot or
     using an editor).

     >>>> The next steps in this section (3) are a bit more
     >>>> complicated since we haven't yet automated the generation of
     >>>> covalent linkages when there are several different ones in a
     >>>> PDB file: this is work-in-progress.

     Afterwards, we need to combine those three covalent linkage
     descriptions (basically extending the explanations for single
     covalent linkages given at the end of each aB_covalent_ligand
     run):

       cat *.dat > combine.dat

       cat <<e > combine.macro
       __args="`ls -1 *.dic | sed "s%^%-l $PWD/%g" | tr '\n' ' '`"
       MakeLINK_LinkagesFile=`pwd`/combine.dat
       RunBusterDuplicatesOverride="`grep "^RunBusterDuplicatesOverride" [A-Z]*macro | sed "s/^.*=//g" | tr '\n' ' '`"
       e

       NOTE: Running through this example has shown a few small
             problems in aB_covalent_ligand related to the residue
             names (a compound called "PIA" and the PIA-ALA connection
             creating a "PIA" linkage by default ... resulting in
             naming overlap). We'll fix that for the next release. For
             the moment you then need to do

               sed -i "s/ PIA / PIA1 /g" combine.macro
               sed -i "s/ N PIA[ ]*$/ N PIA1/g" combine.dat
               sed -i "s/ PIA / PIA1 /g" PIA-C3_ALA-N.dic

     These commands should produce a file combine.macro that looks a
     bit like this:

       __args="-l /some/where/CYS-SG_PIA-CB2.dic -l /some/where/MET-C_PIA-N1.dic -l /some/where/PIA-C3_ALA-N.dic"
       MakeLINK_LinkagesFile=/some/where/combine.dat
       RunBusterDuplicatesOverride="CYP MEP PIA"

     which tells BUSTER to (1) read linkage restraint dictionaries,
     (2) understand LINK records in the PDB file as those specific
     linkages and (3) to handle restraints for those linkages that
     might occur on both the free compopund dictionary as well as
     those linkage restraints.

 (4) (optional) add hydrogens to model

       aB_hydrogenate \
         -p 6S68/6s68.pdb \
         -ecloud -full \
         -l PIA.grade_PDB_ligand.cif \
         -o 6S68/6s68-H.pdb

     and remove those hydrogens that are present in the free PIA
     compound (i.e. defined in the restraint dictionary) but not
     present in the bound version here:

       egrep -v " HN11.PIA | HB2.PIA " 6S68/6s68-H.pdb > 6S68/6s68-H-clean.pdb

     (can obviously also be done in Coot or editor). This is necessary
     because REDUCE (which is what is doing the actual hydrogenation)
     will try and add all hydrogens from the free compound
     description.

 (5) run BUSTER refinement using e.g. (in its simplest form):

       refine \
         -p 6S68/6s68-H-clean.pdb \
         -m 6S68/6s68-unique.mtz \
         -l PIA.grade_PDB_ligand.cif \
         -M combine.macro \
         -d 01 | tee 01.lis

     Or use a slighty more advanced parametrisation (adding water
     update, TLS, occupancy refinement, active hydrogen handling and
     ensuring convergence):

       pdb2occ -p 6S68/6s68-H-clean.pdb -o 6S68/6s68-H-clean.occ

       refine \
         -M Ecloud \
         -WAT WaterPickingHydrogenPartnerAll="yes" \
         -M TLSbasic \
         -Gelly 6S68/6s68-H-clean.occ \
         -p 6S68/6s68-H-clean.pdb \
         -m 6S68/6s68-unique.mtz \
         -l PIA.grade_PDB_ligand.cif \
         -M combine.macro \
         -nsmall 500 -autoncs \
         -d 02 | tee 02.lis

This approach works for us here (and for that particular example)
resulting in a geometry that looks correct and plausible ... can you
reproduce the correct behaviour for this 6S86 example and/or does it
give some pointers for your particular problem?

Cheers

Clemens, Gerard & Andrew

[1] grade.globalphasing.org