| SHARP Tutorials | previous | next |
| Chapter 1 |
Copyright © 1995-1998 by Eric de La Fortelle
Copyright © 2001-2003 by Clemens Vonrhein
and the Buster Development Group.
All rights reserved.
Two peaks were visible on the anomalous Patterson maps, corresponding to the two methionine residues in the structure. Therefore, the selenium atoms at occupancy 1. and temperature factor 30. , at the coordinates of these sites, are used as a starting point for this refinement.
Since we assume that this is the first time you're running SHARP on this data, you should
Start SHARP based on project ID None
This will open a new window (running mainly Javascript code): the SHARP Input Editor.
Setting up the parameter hierarchy
Your browser is now divided into four frames. On the left, you can see
the Table of Contents (ToC), that represents the hierarchical
organisation of parameters. You can interact with this structure, using
the seven buttons in the top frame:
The last frame, in the upper right of the screen, contains the interface logo. The interface is called SUSHI (Simple to Use SHarp Interface) and these are the Japanese characters for it, kindly provided by Atsushi Nakagawa (Hokkaido University).
This tutorial will now take you through all hierarchical levels in the ToC, and tell you in detail what has to be done in each of them.
All the hyperlinks in this page (Identification, Calculation Options, Datafile, Symmetry and Cell and Other information), will take you to the on-line documentation. You are welcome to read it. If you want immediate action, you can start filling or modifying the form.
The field after Project Name : is the name that will be given to all files and directories pertaining to this project. Once you have decided for a name, do not change it ! Documents produced by successive runs with the same project name 'TOTO' will be stored in directories called TOTO.1, TOTO.2 ...etc
Suggested name : IF3-C
You are free to choose a title. But make the first ~30 characters in the title as informative as possible : they will be appended to the project name in file listings, to help you remember which run did what.
For example : Two isotropic Se atoms ; initial refinement.
NOTE : Maybe the meaning of 'appended to the project name in the file listings' is somewhat obscure. It will gradually become clear once you see what listings we are talking about
Under Calculation Options, activate all four tick boxes for Outlier rejection using likelihood histogram , ML parameter refinement , Residual (LLG Gradient) maps , and Centroid electron density map . The residual maps will show you if the model is incomplete, i.e. if there are any minor sites, or characteristics for the known sites that have not been modelled. Electron-density maps are not compulsory at this stage ; they should only be traced at the very end of the refinement.
If you have ticked ML parameter refinement, you will be asked a strategic choice between three options, start by refining scale, Lack-Of-Isomorphism (LOI) and Occupancy parameters only - choice 1, start by refining these and coordinates - choice 2, start by refining all parameters that have been marked for refinement - choice 3. You should select choice 1 if you are starting a refinement from values you are not very sure of, and choice 3 if your starting values are already coming from a previous SHARP refinement. See online doc for more details.
Recommended choice : scale, LOI, occ (1)
Under Datafile, Symmetry and Cell, you should start by selecting a datafile in the list.
Compulsory datafile :IF3-C.data.mtz
If you change the datafile, you will then be prompted with a JavaScript window asking you if you really want to change the crystallographic attributes attached to the datafile, i.e. the spacegroup and cell parameters as read from the header of the MTZ file. If you answer yes, these will be changes to be compatible with the MTZ file you just selected.
The Chemical Composition is not compulsory, but is useful to anchor the scale of your reference dataset to some quasi-absolute standard. The consequence is, that the refined values for occupancies will then be quasi-absolute.
Recommended chemical composition :
C 471
N 143
O 229
S 2
Because we do not have external phase information for IF3-C , encoded in
Hendrickson-Lattman coefficients, indicate None in front of File of external phase
information.
Now click on the next level in the ToC G-Sites to define a list of coordinates for all sites.
Depending on which template file you are using, there may be no G-site defined (in which case you click on the button Create), or there may be already one site (in which case you just modify its coordinates if needed), or you may find two or more sites, in which case you will have to mark them for deletion and press Delete. The Add facility is designed so that you can add a large number of sites from a file. See documentation for details.
Compulsory starting coordinates , site 1 : ( 0.051 ; 0. ; 0.243 )
Compulsory starting coordinates , site 2 : ( 0.009 ; 0.086 ; 0.294 )
Once this is done, you will enter the hierarchical part of the ToC, that consists in four nested levels :
If the documentation is clear enough, you will have understood that a compound is either a native or a derivative. In this case of a MAD experiment, we directly specify the particulars for the selenium substitution.
Note : The idea of a COMPOUND is a new feature of SHARP ! There is no need to define a 'pseudo-native' (and this ugly term is to be banned without mercy). In the case of MAD phasing, there is just one COMPOUND, and the hierarchy of parametrisation is completely 'physical' : you define various 'wavelengths' as needed by your experiment. But since scale and lack-of-isomorphism are in essence relative quantitites, a reference still has to be defined. More explanations about the special role of COMPOUND 1 / WVALELENGTH 1 as a reference can be found if you click the documentation hyperlink "Reference" in the ToC.
If there are more than two C-sites in the template, mark it/them for deletion and press Delete.
If there are less than two C-sites, create the number required, and assign
a Chemical name (in the 'Atom' column) to a G-site
(in the G-site column).
Beware ! this is case-sensitive : the first
letter should be uppercase, the other(s) should be lowercase. In any case, if
you assign a name that is not present in the list of possible chemical types,
the interface will refuse it, and you will be prompted with an Alert
window.
Compulsory action ,
C-site 1 : write Se in the field
and select G-SITE-01in the scrolling menu
Compulsory action ,
C-site 2 : write Se in the field
and select G-SITE-02
in the scrolling menu.
Now we move down one hierarchical level, to the first crystal of the first compound : please activate X-1 just under C-1.
NOTE : You could object that we ought not to refine the value of the occupancy in this case, because we know it : it is very likely to be 1., or very close to it... in physical terms. But in this refinement, the optimal value of the occupancy also depends on the accuracy of the absolute scaling of the data. Do not be surprised, then, if the occupancies for the Se atoms refine to values 20% or 30% away from 1. : it just reflects the lack of precision of the absolute scaling.
Please move to the next hierarchical level, and activate W-1 under C-1.
Recommended resolution limits : 20. to 2.
Please move to the next hierarchical level, and activate B-1 under W-1.
The button Select Columns will take you to our MTZ hyper-editor. The present version of this program does not enable you to select columns by clicking on them, but is a useful alternative to using 'mtzdump' in the Unix window if you have forgotten the names of the relevant columns in the MTZ datafile.
To know what FMID, SMID etc. mean, plese press on the hyperlink Assign columns from file. It will take you to the relevant part of the documentation.
To know what FMID, SMID etc. mean, plese press on the hyperlink Assign columns from file. It will take you to the relevant part of the documentation.
Compulsory assignments for wavelength 1 :
FMID = FL3 ; SMID = SIGFL3 ; DANO = DL3 ; SANO = SIGDL3
Embarrassing NOTE : In the documentation , we strongly recommend that the ISYM column be present if anomalous differences are measured. We apologise for it not being the case here.
Why choose the third wavelength as the first in the SHARP data structure ? In this MTZ file, as is often the case, the remote wavelength is called number 3. By convention, SHARP takes as a reference for scale and lack-of-isomorphism the first dataset in the list. We recommend (see documentation) that the most 'reliable' wavelength (i.e. no shifts in f' and f'') is taken as a reference. In most cases however, the outcome of the phasing procedure will not be significantly changed if another wavelength is taken as a reference.
Scaling parameters. In the absence of information about these parameters, we will start from (for example) K = 1. and B = 0. . Because the first dataset is a reference for the scale, these values should NOT be refined.
Because we take dataset for the first wavelength, by convention, as a reference for the scale, if you activate the Estimate first? tick box for the multiplier scale factor (K), SHARP will perform a pseudo-absolute scaling based on the chemical composition given in the first page 'Global Information Editor'. In case you request this, you should have previously made sure the chemical composition makes sense. Do not activate the Estimate first? tick box for the B scale factor : it would deny the protein atoms the right to thermal disorder, and result in unphysical values for the heavy-atom B-factors.
Because we have to choose a reference dataset for `isomorphous' lack of isomorphism, Global non-isomorphism parameters and Model imperfection parameters for isomorphous differences should be all set to zero (0.) and not Refined nor Estimated. Conversely, the same parameters for anomalous differences should be both Estimated and Refined.
The anomalous scattering parameters f' and f" for the remote wavelength must be specified. f' will be kept constant (unrefined) because one of the f' serves as a reference for the others ; the one for the remote wavelength being the most precisely known is then kept as a reference.
Recommended values for the anomalous scattering factors, wvl 1 :
f' = -2.19 ; norefine f" = 3.46 ; refine
Recommended resolution limits : 20. to 2.
Please activate B-1 under W-2.
Compulsory assignments for wavelength 2 :
FMID = FL2 ; SMID = SIGFL2 ; DANO = DL2 ; SANO = SIGDL2
Scaling parameters : this time all scaling parameters have to be both estimated and refined, because this scaling is relative to the first dataset (here 'first dataset' means wavelength 1).
LOI parameters : similarly, they are now all estimated and refined.
Anomalous scattering parameters : both refined from the values in the Sasaki tables (Sasaki, 1989)
Recommended values for the anomalous scattering factors, wvl 2 :
f' = -7.35 ; refine f" = 5.921 ; refine
Recommended resolution limits : 20. to 2.
Please activate B-1 under W-3.
Compulsory assignments for wavelength 3 :
FMID = FL1 ; SMID = SIGFL1 ; DANO = DL1 ; SANO = SIGDL1
Scaling parameters : this time all scaling parameters have to be both estimated and refined, because this scaling is relative to the first dataset (here 'first dataset' means wavelength 1).
LOI parameters : similarly, they are now all estimated and refined.
Anomalous scattering parameters : both refined from the values in the Sasaki tables (Sasaki, 1989)
Recommended values for the anomalous scattering factors, wvl 3 :
f' = -9.52 ; refine f" = 3.15 ; refine
If you notice anything wrong, or if you want to exercise going through the pages again, click on Cancel and choose in the Table of Contents which page to edit anew. If you want to have a coffee, and submit the job when you come back, click on Save only ; you will have to go through the restart mechanism (see end of this document) to get the refinement going from these saved parameters.
In any other case, just click on Submit to start the refinement.
You will be prompted by a confirmation message, and a possibility to go directly to the directory where you will find the logfile of the calculation.
Click on Go to output. This takes you to the output directory for this run of SHARP.
We will comment on the logfile that you obtain if you followed exactly the recommended values in the input pages. If you have deviated from ideality (which is not advisable if you are a first-time user of this tutorial, but could be interesting for further tests), do not be surprised if the results are different from those mentioned here !
To access the main logfile (from which all other files in the directory can be consulted), click on the filename 'LIST.html', described as SHARP OUTPUT in the rightmost column.
While the lack-of-isomorphism parameters slowly converge, the occupancies for the two selenium atoms refine to very different values. That may be due to a wrong 'guess' for the initial value of the temperature factors. These will be refined later, at BIG CYCLE 3.
After ten cycles, the refinement for this subset of parameters has not converged (our current convergence criterion is that the step length should be smaller than 0.2 -- for a definition of 'step length' please click on the relevant 'explanation' hyperlink). Then, we head on to BIG CYCLE 2, with more parameters being refined, namely the coordinates of the Selenium atoms. Occupancies keep adjusting (a little) while the coordinates are being refined ; the lack-of-isomorphism parameters decrease slightly.
At BIG CYCLE 3, the most sensitive parameters (heavy-atom temperature factor and anomalous scattering factors) are added to the list.
Please feel free to click on all available hyperlinks, to look at the details of the refinement (first and second-order derivatives, eigenvalues of the hessian matrix), or at the agreement between our model of lack of isomorphism and the mean lack of closure error in resolution bin. In particular, you will notice that both the isomorphous and the the anomalous lack of closure have very strange statistics, probably due to large uncertainties in the values of the standad deviations of the measured anomalous differences.
Before clicking on the View button, you have to choose the level of the residual map (see on-line documentation for details). In this case, the only map where we can hope to see a clear signal is the anomalous residual map at wavelength 2, where the anomalous differences are maximal on average.
Recommended level : iso/ano
You then select the line 'Compound1/Crystal1/Wavelength2/Batch1 ANO (1 1 2 1 ANO)' in the scrolling menu (our general way to specify an anomalous residual map for the second wavelength), and click on Go to viewpoint selection.
The only options that are usable for the moment are to view the whole map (i.e. the whole asymmetric unit) with program PLUTO (npo), or with the program 'O'. Just click on Plot to trigger the Fourier transform and the plotting program. A new window will then appear, with five sections through the residual map. You can then click on Peak List to see the list of peaks and their coordinates. This enables you to note down coordinates of new peaks prior to entering them in the G-sites list of the graphical SHARP input.
If you choose to view Pluto maps (map sections), first click on
relative in the new window, then
click successively on 'Picture 1', 'Picture 2', .... To enlarge the
pictures, as long as you are in 'relative size' mode, you can just enlarge the
window where the section is displayed - the map should be enlarged to fill the
window.
If you choose O as a plotter (recommended when possible), an O window appears,
and you just have to do what is written there. The user menu is either
displayed directly on the right side of the graphical O window (version 5),
or as a sub-choice of the the menu 'menu', on the right side of the command
bar.
In the present case, the only significant features in the map occur in the vicinity of the two selenium sites. In addition, the temperature factor for the second site refines to a much higher value than for the first site (respectively 24 and 60). It may not be visible on the pluto sections, but site 1 is surrounded by a torus of positive density, while there are two peaks in the direction perpendicular to the plane of the torus. This clearly indicates anisotropy for the thermal disorder of site 1. There are also two pairs of negative and positive peaks around site 2, but much weaker. We conclude that both sites are subject to anisotropic disorder. No other modification to the heavy-atom model seems to be necessary, so we can move on to the next refinement run.
To be on the safe side, please click on the Reload button (fourth from left in the Netscape frame), so that the list of files in the various scrolling menus get updated.
Depending on what name you gave to the first refinement run, you will find in
the scrolling menu in front of the START
button, a file called end_<name-of-run>.1.sin
This parameter file contains all the
refined values at the end of refinement run 1. Except if you want to
experiment with things, we advise you to highlight this file in the list, then
click on the START button.
For example : "Two anisotropic selenium atoms ; refinement of parameters."
You can look at the rest of the page, but it should remain unchanged. Activate the tick box in front of G-sites.
Now click on the button Submit in the top frame.
Because this is a MAD refinement, the residual lack-of-isomorphism parameters do not decrease when new information is added. This probably comes from the fact that they are dominated by errors in the experimental standard deviations on the measurement of the anomalous differences. This illustrates one of the main advantages of the maximum-likelihood refinement : it allows all parameters including those describing the lack of isomorphism to be refined together, so that remaining sources of bias can be attenuated on average. Of course it would be best to have statistically accurate standard deviations for all measurements, but that requires very high internal redundancy.
As for the first map, please select iso/ano in the scrolling menu, and click on the button View.
On the next page, select Compound1/Crystal 1/Wavelength 2/Batch 1ANO (1 1 2 1 ANO) and click on the button Go to viewpoint selection. Then click on Plot.
In this map, you can notice that the refinement of anisotropic thermal motion for both selenium sites has effectively removed the pairs of positive and negative peaks close to these sites. One large peak remains, very close to site 1. Just above that site, at more than 7 map standard deviations (sigma) - (you can get the list of sites, while looking at the residual maps, by clicking on 'Peak List', and, if you are using O, you can see which is which by clicking on the small cross at the center of the peaks ; you will see the coordinates and other information displayed in the upper left corner in your window. The height of the peak you just have clicked on will appear under the heading "B-factor", in sigma units).
Note : In fact this peak is higher than it appears. This is an artifact of the map being traced in the asymmetric unit only. If you re-trace the residual map around the heavy-atom site, the first and second peaks in the list 'merge' to produce a single larger peak.
This peak is very puzzling : it cannot be a third selenium atom, because there are only two methionines in the sequence. It cannot come from anisotropy, because that has already been refined, and because there is no associated negative peak. We are then forced to postulate that the methionine has two alternate positions, the second position at the coordinates of the main residual peak (fractional coordinates from the peak list - you may have to enlarge your window to see them in full) :
x = 0.03038 y = 0.02031 z = 0.23254
Please highlight this file in the list, and click on START.
For example : Two anisotropic selenium atoms + one minor isotropic
You can look at the rest of the page, but it should remain unchanged.
Now that have a plan for where to go, let's do it, and first click on G-sites in the Table of Contents.
Suggested occupancy : 0.2
Suggested temperature factor : 24.
The refinement tick box for occupancy should be activated, and the refinement
mode for the temperature factor of the third T-site (B-factor refinement,
third line) should be chosen as 'Isotropic'.
No anisotropic refinement is asked for,
because the isotropic temperature factor has to be refined first, and because
there is no evidence of anisotropy at this point.
Now all extra parameters for the 'third site' have been set up, you can look at other hierarchical levels if you want but it it not strictly necessary, so when you are ready to re-submit the refinement job, please press Submit in the top frame.
One eigenvalue, as before, remains filtered. If you click on (details), you can see (with relief) that the incriminating eigenvalue is negative, but has been filtered because its absolute value is very small ; it could lead to an ill-conditioned Hessian matrix. A positive eigenvalue at this stage would be more worrying. You can also notice that the eigenvector corresponding to that eigenvalue has strong contributions from the x coordinate of both site 1 and site 3 : this is not surprising. These sites are less than 2 Angstrom apart, creating problems for the calculation of derivatives.
Note : It is also well known that the selenium atoms have anisotropic anomalous diffraction properties. It is therefore probable that the remaining residual peaks come in part from that un-modelled effect as well.
The file containing the result of the SHARP phasing is in the 'logfiles' subdirectory corresponding to your run, and named 'eden.mtz'. It contains the two-dimensional centroid of the structure-factor probability in the complex plane. This extension of the idea of Blow and Crick (Blow & Crick, 1959) both enables more precise amplitudes to be output (because all measurements participate to it) and more precise phases as well, because they are no more tied to some 'pseudo-native' amplitude, or to one of the wavelengths. In practice, this also means that the FOM should NOT be applied as weights in the Fourier transform, because by definition two-dimensional centroids are already FOM-weighted.
The SHARP interface enables you to run the density modification program SOLOMON (Abrahams & Leslie, 1996), available from CCP4 (version 3.0 and higher). The graphical interface activates all the scripts that are needed to use this program, you just have to enter the solvent content. Even this can be refined by a trial-and-error methid, as you can read in the on-line documentation.
Recommended solvent content : 48%
Clicking on Flatten will take you to the solvent flattening page, from where you can examine the logfile of the solvent flattening, that includes useful statistics about what is going on from one cycle to the next, or you can examine the electron-density maps during the course of the solvent flattening. The procedure stops after 130 cycles (a couple of hours on a medium-sized problem).
If you go there (through the Go menu in the Netscape frame, or via a Bookmark, or by typing the URL in full, or by exiting Netscape and starting again), you see after the Refine and the Synthetise buttons, another button called Request. In front of this button, you find a first menu with a list of possible actions, and a second menu with a list of runs. If you want to restart the run, say, IF3-C.2, select IF3-C.2 in the rightmost menu, then select Restart in the middle menu, then click on the button Request button.
SHARP will always restart from the end of the last completed cycle, but will go through likelihood filtering before refinement. That means you do not restart exactly in the same conditions (in fact, you start in somewhat better conditions because the outlier rejection will be more precise if refinement is more advanced). But do not be surprised if the first cycle after restart does not behave exactly like the last uncompleted one.
Biou, V., Shu, F. & Ramakrishnan, V. (1995).
EMBO J. 14, 4056-4064.
Sasaki, S. (1989).
KEK. Report 88-14. National Laboratory for High Energy
Physics, Tsukuba, Japan.
Blow, D. M. & Crick, F. H. C. (1959).
Acta Cryst. 12, 794-802.
Abrahams, J. P. & Leslie, A. G. W. (1996).
Acta Cryst. D52, 30-42.