This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.

Authors: (2011) A. Sharff, P. Keller, C. Vonrhein, O. Smart, T. Womack, C. Flensburg, W. Paciorek and G. Bricogne
Contact: buster-develop@globalphasing.com
Version: 1.1.1

Partial support from EU projects: SILVER (FP7-HEALTH-F3-2010-260644)

## 1. Pipedream.

### 1.1. What is Pipedream?

Pipedream is an "expert" system to link and automate [a] data processing with autoPROC, [b] a "limited" molecular replacement stage with Phaser, [c] structure refinement with BUSTER and (where requested) [d] automated ligand fitting with Rhofit with [e] subsequent BUSTER post-refinement of the top solution (again, where requested). The required input for Pipedream is an input data set, either in the form of unprocessed diffraction images or as a pre-processed mtz file, an input model and optionally, an associated mtz file. Consistent relations between these items are expected, as detailed below.

### 1.2. Scope and Limitations.

Pipedream has been specifically designed as a pipeline tool to facilitate the use and integration of Global Phasing’s primary software packages (autoPROC, BUSTER and Rhofit) into a (high-throughput) fragment/ligand screening pathway. As such, its scope is quite rigidly defined and a number of limitations on its use apply.

It is anticipated that the primary use for Pipedream is where multiple data sets have been collected on a single target, differing only in the soaking/co-crystallisation conditions of the crystals. As such, it requires that the structure used as the input model for structure solution be essentially identical to that present in the crystals from which the various datasets have been collected - not only the same protein and same sequence, but CRITICALLY, the same space group and cell dimensions (allowing for small differences in the latter due to non-isomorphism, changes in environment due to soaking/co-crystallisation and freezing)!

The input model should be an appropriate APO structure. This may be a native structure or indeed a structure with a known ligand bound. Any "non-protein" molecules that are present in the binding site of interest (i.e. where you are looking for new bound ligands) MUST be removed from the input model. Other built-in ligands, such as cofactors, prosthetic groups, ions etc. - other HETATM’s - can be retained. Careful consideration should be given to any water molecules in the input model. As the chemical environment in soaked/co-crystallised crystals may well be expected to differ from the "apo" structure used as the input model, the water structure in the "apo" model may not be fully conserved. Therefore, Pipedream will automatically remove all water molecules by default, unless the explicit argument "-keepwater" is used to override this behaviour, see Optional arguments, in cases where specific water structure is known or assumed to be conserved. IMPORTANT: Conserved water molecules, together with other non-covalently bound HETATMS, in the input pdb file MUST have been assigned the chain id corresponding to the protein chain to which they are bound, otherwise they will not be retained.

Pipedream is (currently) NOT designed as a tool for automated structure solution. It cannot deal with data that require experimental phase determination, nor can it deal with cases that require full Molecular Replacement.

#### 1.2.1. Minimum input

1. A reference structure (pdb file) and associated reference mtz file. The latter is optional though strongly recommended. If both are given, Pipedream will confirm that the space group of the reference structure and mtz file match and that the cell dimensions are substantially the same, otherwise it will terminate immediately. Where given, the reference mtz file MUST contain a set of structure factors, ideally those from which the reference pdb structure was refined. If the reference mtz file also contains a Free R set, this will be "transferred" to the experimental data (whether input as raw images or as a pre-processed mtz file). This is considered good practice as it allows proper cross-validation of all related datasets that are refined against the input model (Pozharski et al. Acta D (2013), 69, 150-167). Where a reference mtz file is not given, Pipedream will "back calculate" structure factors from the reference model, will generate a new Free R set and will combine them into an ersatz reference mtz file for use in subsequent steps. We advise against doing this.

2. An input data set. This can either be in the form of unprocessed images, in which case the data will be automatically processed by autoPROC, or a file containing already processed scaled/merged data. This can be an mtz file, a Scalepack reflection file or a d*TREK reflection file. Both Scalepack and d*TREK reflection files will be converted to mtz format before further processing. Where a file of pre-processed data is input, Pipedream will confirm that its space group matches the reference structure and that their cell dimensions are "similar", otherwise it will terminate. If the input data file contains structure factor amplitudes, they will be used. However, if only intensities are available, Pipedream will automatically run truncate to calculate structure factor amplitudes. Note that Pipedream will terminate if it finds more than one set of structure factor amplitudes unless a unique F/SIGF pair has been specified. Pipedream will also ensure that the input and reference mtz files are consistently indexed, reindexing the input mtz file if necessary. Note that Pipedream will only accept experimental data as unprocessed images if autoPROC is available. If not, only pre-processed data, in the form of an mtz file, Scalepack of d*TREK reflection file, will be accepted.

Pipedream is designed to ensure that all output structures (and maps) are in the same asymmetric unit as the model used as input to Pipedream, so that all output structures and maps from multiple runs of Pipedream (using the same reference model) are directly superimposable. For this purpose, all input data sets are examined to check that they are consistently indexed with the reference structure and that both input and reference data conform to the CCP4 definition of the asymmetric unit for the appropriate space group. Thus it is important that the reference mtz file should be directly associated with the reference pdb file. If this is not the case, the limited MR procedure may not be successful and in any case the consistency checks to ensure that the output is superimposable over the input will in effect be bypassed. This caveat also applies in the event that the same mtz file is used as both the input and reference data. Although not proscribed, this is definately not to be recommended and Pipedream will generate a warning if it detects that this is the case.

### 1.3. Multiple models.

Conformational change in proteins is a well-known and studied phenomenon. Such changes can be extremely localised, limited to alterations in side-chain conformation, or they can be much much more extensive, such as rigid body domain movements. Localised loop movements are frequently observed in proteins, particularly in response to ligand / cofactor binding.

Such loop movements can be of a large enough magnitude that refinement alone is unable to deal with them - hence it is important to pick the correct input model for refinement. For example, if you are looking at a protein where a loop occludes the known ligand binding site in its apo state, but moves out of the site in response to the presence of a ligand, then it does not make sense to use an apo model, with the loop in the in conformation, in refinement against data where a ligand is bound. Refinement alone is unlikely to move the loop out of the binding site and subsequent ligand fitting will fail, resulting in a false negative. Conversely, it is similarly unwise to use as input to refinement a model for a conformation of the protein in a case where a ligand is bound if the experimental structure is in the apo conformation, this time resulting in a false positive.

However, in the context of running a fragment/ligand screening pathway, how do you know ahead of time whether or not the soaked ligand has bound, and therefore which input model to use with Pipedream?

Pipedream deals with this issue by allowing the input of multiple models, performing an initial refinement on all of them, after which it makes a decision as to which best matches the experimental data. It will then carry on with refinement and ligand fitting using this model alone.

#### 1.3.1. Procedure

In order to determine which model best matches the experimental data, Pipedream looks at main-chain real-space correlation coefficients (calculated against the refined electron density maps, with the CCP4 program edstats) after initial refinement. The model that best matches the data is the one with the highest mean CC. However, this can be very insensitive when calculated over the entire structure. Therefore, to increase the sensitivity of the method, Pipedream calculates and uses the mean CC only for residues where there is significant conformational change.

The preferred method of identifying these residues is for them to be explicitly defined by the user. However, if they are left undefined, Pipedream will attempt to automatically identify regions showing conformational change by stepping through the structure and looking at pairwise RMS deviations for each residue. By default, any residue which has an RMSD of greater than 1.5Å in any of the pairwise comparisons will be selected. Note that in the eventuality that no residues are found above the defined RMSD threshold, Pipedream will select as the best match the model with the lowest Rfree after initial refinement.

#### 1.3.2. Requirements and limitations

The method employed currently is designed to distinguish between conformational changes produced by main-chain differences, such as loop movements. Additional, limited domain movement (such as hinge-bending) can also be accommodated by running Pipedream with an appropriately constituted rigid body definition file, so that the initial refinement can correct for relative domain shifts before model comparison.

However, the current method is not sensitive to conformational change caused solely by side-chain movements.

The input model requirements listed in section 1.2 (same space group and cell as the experimental data, same protein, same sequence) apply to ALL input models. In addition, all of the input models must be directly superimposable. Furthermore, they must all share a common residue numbering and chain identification scheme.

Importantly, unmodelled residues in one or more of the input models (presumably due to disorder), are to be avoided if at all possible. Any significant number of unmodelled residues in any of the input models (unless missing from ALL of them) could potentially compromise the ability of Pipedream to select the correct model.

Pipedream deals with unmodelled regions slightly differently, depending on how they have been defined. Where Pipedream is left to determine automatically which regions to compare, it will remove from consideration any residue, regardless of pairwise RMS deviation, which does not appear in all of the input models. The potential drawback is that regions where there is genuine conformational change could be excluded from analysis if those regions are not present in one or more of the input models. Where the residues for analysis are specified by the user (the preferred method), any residues missing in one of the models would be assigned a default CC of 0 for that model. Again, a significant number of missing residues from one or more of the input models could potentially compromise the ability of Pipedream to select the correct model.

Bear in mind that intended use and scope of Pipedream implies that the ONLY difference between the input models (and their internal PDB attributes) should be in (relatively) localised conformational changes.

ALL input models must be superimposed on each other before input into Pipedream (using CCP4 program gesamt or coot) - Pipedream will not superimpose them itself.

Whilst there is no limit to the number of models that can be input, if there are a limited number of distinct conformational states, we would recommend using only the one (or two) best models representative of each conformational state. Adding more and more very similar models may simply increase the CPU time required without any improvement in precision in arriving at the correct solution.

Although Pipedream can be used to automatically identify the regions that differ (as described above), the preferred method is to tell Pipedream which residues to use for structure comparison, using either (or indeed both) the -seqin1 or -seqin2 options (see Chapter 3 for a description of the use of these options).

Ideally, analysis of the input models should be based on comparison of at least 5 residues. Although Pipedream does not enforce a minimum number of residues, if run in automatic residue determination mode, it will note in the main output if the number of residues selected above the threshold is below 4. In this case, you may want to re-run Pipedream lowering the default RMSD threshold.

### 1.4. Program dependencies.

As well as autoPROC, BUSTER and Rhofit, Pipedream will run various CCP4 programs. In particular, Pipedream requires version 2.5.6 (or later) of Phaser. This is installed in CCP4 versions from 6.4.0. Pipedream also requires the reduce program from Molprobity.

Pipedream also incorporates buster-report, which has a number of external dependencies, such as mogul and grace. For further details please see the locally installed software installation instructions in <installation root directory>/docs/installation. Pipedream will test for the availability of these dependencies and if certain ones cannot be satisfied will not attempt to run buster-report.

You can test that all of Pipedream’s dependencies have been satisfied by running pipedream -checkdeps.

## 2. Pipedream architecture

Pipedream runs several packages, each generating its own output. In order to keep this output separate and clear, Pipedream will generate a specific directory structure to keep the output from each stage separated. Definition of a root directory <ROOT> in which to create this structure is obligatory.

Pipedream can be run manually. However it has been written to allow it to be called automatically and multiple runs to be run in the background or submitted on remote machines. Thus, it does not write any information to standard output, unless problems with the input files or mistakes made in invoking Pipedream prevent normal execution. All output is written to disk and may be reviewed at leisure. A summary of the main output is written into the file ROOT/summary.out.

Stage 1: Input x-ray diffraction data are processed with autoPROC (unless a pre-processed mtz file is used as the primary input). Output is written into the directory <ROOT>/process, with the standard output from autoPROC in <ROOT>/process.out.

Stage 2: The degree of non-isomorphism observed between crystals, especially after soaking experiments, can easily exceed the limit that can be corrected by rigid body refinement. Thus, Phaser is used in a specific mode to run a very "limited" molecular replacement procedure. This has the advantage that it is fast and can deal with fairly significant molecular movements due to non-isomorphism and/or conformational changes due to ligand binding. The angular range allowed for the function is matched to what is accepted as a reasonable degree of cell dimension variability - see Appendix A. Whilst this angular range can be doubled in cases of more extensive non-isomorphism, the procedure CANNOT cope with the more significant transformations seen where the search model has a different cell / symmetry to the data. By default, the input structure is treated as a single rigid unit, regardless of the number of protein chains present in the model. However, where the asymmetric unit contains multiple chains (whether homomeric or multimeric) and if so desired, individual chains or groups of chains can be defined as separate units (with the -chains option) and will be treated independently. This approach may well be be beneficial in such cases. However, one caveat that should be borne in mind is that translational NCS in the input model could lead to a possible failure mode. If the input model is known to contain chains related by translational NCS then either they should not be treated as independent units, or the brute force translation function should be selected (with the -btf keyword).

Stage 3: The structure is then refined with BUSTER, with the explicit aim of producing the best difference map for identification of bound ligands. Three different refinement protocols are available (quick, default and thorough), the choice of which is dependent on the size and degree of movement/flexibility observed in the target structure. Output for the final run is written into the directory <ROOT>/refine, with the standard output from BUSTER in <ROOT>/refine.out.

The three protocols are as follows:

Quick: performs a single BUSTER refinement

 1: -RB -L -WAT 2 Rigid Body refinement (1st big cycle). Turn on "water" addition after big cycle 2 to look for ligand density.

Default: run 2 rounds of BUSTER refinement

 1: -RB Rigid Body refinement (1st big cycle). 2: -M TLSbasic -L -WAT 3 Turn on TLS refinement. Turn on "water" addition after big cycle 3 to look for ligand density.

Thorough: run 3 rounds of BUSTER refinement

 1: -RB Rigid Body refinement (1st big cycle). 2: -M TLSbasic -M WaterUpdatePkmaps -WAT 3 Turn on TLS refinement. Turn on water addition after big cycle 3. 3: -TLS -L -WAT 2 Continue TLS refinement. Turn on "water" addition after big cycle 2 to look for ligand density.

Which protocol to use.

Results of internal testing suggest that the default protocol is appropriate in most cases and that would certainly be our recommendation in the first instance. For fairly rigid proteins that show little flexibility or variation, the quick protocol may well be sufficient. For larger proteins, particularly complexes with more than one chain in the asymetric unit, which show a high degree of variability and conformational flexibility, the thorough protocol may be more appropriate.

Multiple model input:

If run with more than one input model, Pipedream will run stage 2 and the first cycle of refinement (as defined by the specified refinement protocol in stage 3 above) for each of the input models. After automatic analysis of the conformational differences between the refined models (unless the -seqin1 and/or the -seqin2 options are specified), Pipedream will select which of the refined models gives the best fit to the data over the selected residues. Subsequent steps are only carried out with this one model. Following selection, the remaining refinement cycles (unless the quick protocol was specified) are run on the selected model.

Stage 4: If specified with one or more refmac-style CIF restraint dictionaries for soaked/co-crystallised ligands, Rhofit will be run for each specified ligand in turn to attempt to locate and fit the ligand into the refined structure. Output is written into the directory <ROOT>/rhofit-<dictionary name>. Standard output from rhofit is in <ROOT>/rhofit-<dictionary name>.out. By default, Rhofit will only attempt to fit the ligand to the single best "cluster". If you expect to see the ligand bound to more than one site (for instance the structure has two or more identical chains in the asymmetric unit) then you will need to tell rhofit how many "clusters" to fit. See Optional arguments for more details.

Where requested, a subsequent run of BUSTER will be performed, to post-refine the top solution from Rhofit. The default in this case is to perform a full, standard BUSTER run, however, if the intention is simply to update the ligand fit and generate new maps, a short BUSTER refinement can be requested (using the -M ShortRunVoid macro). In cases where the fit ligand is quite large and/or has several degrees of freedom or the protein structure is quite large and flexible, a more thorough post-refinement can be requested. This will run two rounds of BUSTER, although this is not generally required. By default, post-refinement will also refine the occupancy of the ligand(s) fitted by Rhofit, unless specifically told not to do so. Prior to post-refinement, hydrogen atoms will be added to the ligand (with zero occupancy if the resolution is lower than 2.0Å or with full occupancy if the resolution is higher than 2.0Å). Output from this run will be written to <ROOT>/postrefine-<dictionary name> and with standard output written to <ROOT>/postrefine-<dictionary name>.out.

Note: The implementation of Rhofit in Pipedream allows fitting of a single ligand or, where a crystal has been soaked in a cocktail of compounds, fitting each component independently to allow the user to determine which, if any, component has bound, i.e. to answer the question "Does compound A or B or … etc bind?". It CANNOT be used to successively fit multiple compounds into a structure, i.e. to answer the question "Do compounds A and B and … etc all bind?".

buster-report will also be run (unless its dependencies are not satisfied) to give a concise report on the outcome of refinement. If both Rhofit and subsequent post-refinement have been requested, buster-report will be run on the output of post-refinement. If not, it will be run on the final output of the initial refinement protocol. The output from buster-report will be written to <ROOT>/report.

## 3. How to run Pipedream?

• To invoke Pipedream, simply use the command:

% pipedream <options>

• A basic invocation of Pipedream would look something like:

% pipedream -imagedir <directory> -d <output directory> -xyzin input.pdb -hklref input.mtz

### 3.1. Details of command-line arguments

 no argument or -help Quick help message listing most important arguments. -help process Quick help message listing most important autoPROC arguments. -help refine Quick help message listing most important BUSTER arguments. -help rhofit Quick help message listing most important Rhofit arguments.

#### 3.1.1. Minimum required arguments

 -imagedir [directory name] Directory containing the raw images. This directory should contain the images for a single dataset only. Pipedream can cope with datasets that have been collected in multiple scans (for example high and low resolution passes or scans collected with multiple orientations on a kappa/Eulerian goniostat), provided adequate information relating these scans is provided in the image headers (and, for multi-axis goniometers, in a local configuration file for the relevant beamline). - or - -imagescan  Use this option to input a specified set of images. The scan definition is the same form as the -Id option in autoPROC, i.e. ,,,,. To find sets of images in a particular directory and output scans in the correct format, you can run the command find_images -l -d . Multiple scans can be input by multiple invocations of -imagescan. Please note though that multiple scans MUST be images collected at the same wavelength. Pipedream CANNOT deal with images collected at multiple wavelengths. - or - -hklin filename.mtz/sca/ref Input scaled & merged mtz/scalepack/d*TREK file. Scalepack or d*TREK reflection files will be automatically converted to mtz format. Pipedream CANNOT accept unscaled/unmerged data. If the input file does not contain structure factor amplitudes, truncate will be run automatically. The data will also be automatically reindexed (if required) to ensure that it is consistently indexed with the reference mtz file. A Free R flag will also be added if one is not present and the -nofreeref option is also specified. If more than one set of structure factor amplitudes are present, Pipedream will terminate rather than make an arbitrary decision on which amplitudes to use, unless a unique F/SIGF pair is specified (see below). -d [directory name] Output directory. All pipedream output will be written in a defined tree under this directory. Specifying an output directory is COMPULSORY to ensure that the output from a run of Pipedream is kept separate from any other, and enables the output to be separated from the input data, which may in any case be desirable as part of the data management policy in your research group. -xyzin  Input pdb file(s). Enter as a comma separated list if more than one input structure is specified. These structures should be of the same target protein as the input data and they are ALL expected to have the same cell and space group! If more than one model is input, they must all be superimposable! IMPORTANT: These structures should be APO structures. They should NOT contain any ligands in the binding site(s) of interest (where you are looking for bound ligands)! However, they should contain any associated co-factors that are not expected to be affected by the soaking of the putative ligand. -hklref filename.mtz OPTIONAL (but strongly recommended) Reference MTZ file. This file should go together with the reference pdb file (where multiple pdb files are specified, the reference mtz file should go with the first input model). It MUST contain a set of structure factors and also the Free R set that was used in refining the input reference structure. If it does NOT contain a Free R set, Pipedream will terminate unless the -nofreeref option is also specified, in which case it will generate a new Free R set. If a reference mtz file is NOT specified, Pipedream will "back calculate" structure factors from the reference structure together with generation of a new Free R set and use these as the reference set (where multiple input models have been input, structure factors will be calculated from the first input model). In this case, as a reference Free R set is clearly not available for use, specifying the -nofreeref option is compulsory.

Note: If autoPROC is not installed, the -imagedir and -imagescan options will be disabled and only the -hklin option will be available.

The expected cell dimensions and space group will be read directly from the reference mtz file and autoPROC will ensure (where the symmetry allows the possibility of alternate indexing) that the experimental data are indexed consistently with the reference. Given that the reference pdb and mtz files are paired, this ensures that the limited molecular replacement should be successful, and has the added advantage that where pipedream is run on a series of structures, they will all end up in the same asymmetric unit and will therefore be directly superimposable. If the reference mtz file contains a Free R set, this set will be used for the processed data. Thus all data sets processed with Pipedream using the same input mtz file will have a common Free R set. As previously described, use of a common Free R set is good practice, in this context, to allow for proper cross-validation between structures.

#### 3.1.2. Optional arguments

Further optional arguments are grouped into options for autoPROC, BUSTER and Rhofit (see pipedream -help).

##### Multiple model input options:
 -seqin1  File containing comma-separated list of the residues to be used for structure difference analysis (in the form  , i.e. GLY A 34,ALA A 35,THR A 96) - and/or - -seqin2 "residue list" Double-quote enclosed, comma-separated list of the residues to used for structure difference analysis (in the form  , i.e. GLY A 34,ALA A 35,THR A 96). If this is used together with the -seqin1 option, a combined list of residues listed through both options will be used. -rmsd  Threshold value for pairwise RMS deviation bewtween residues to be selected for analysis (default = 1.5Å). This option can only be used if neither -seqin1 or -seqin2 options are specified.
##### autoPROC options (only where autoPROC is installed):
 -cell <"a b c al be ga"> Cell dimensions. This will override the cell read from the reference mtz file. Not generally recommended. -mproc  Comma-separated list of autoPROC macros. -kappa  Specify site for use of kappa/eulerian goniometer. Use without an argument to list available sites. -beam "x y" Specify direct beam position (in double quotes). Default is to use direct beam position as specified in the image header. -beamtransform
##### Data acceptance criteria:

The primary goal of Pipedream to automate processing and structure refinement for ligand detection. We consider that there are certain minimum criteria that the data need to meet to make looking for ligands, particularly small ligands, viable.

Where raw images have been input, the data are checked against these criteria and if any of these checks fail, then Pipedream will terminate cleanly. The current criteria are based on resolution, completeness and Rpim. The default values of these can be changed with the following options:

 -rmin  Minimum acceptable high resolution limit (default = 3.5Å). -rpim  Maximum acceptable rpim (default = 25). -completeness  Minimum acceptable overall data completeness (default = 60).
##### Optional "Molecular Replacement" arguments:
 -chains "chain list" Double-quote enclosed, space-separated list of individual chains/multimers to move independently. For example -chains "A B C D" will move chains independently, whereas -chains "AB CD" will treat chains A & B as a single rotatable unit and chains C & D as another single rotatable unit. By default, if this option is not specified, Pipedream will treat the entire input model as a single unit. -btf Use brute force translation function. Default is to use fast translation function. The fast translation function is the faster protocol, however, if the input model has translational NCS, you may get better results from the brute force translation function. This option should only be used in combination with the -chains option. -mrres []  Resolution limits for MR. Most of the time, the defaults (low res limit left unset and high res limit set to 3.0Å) are adequate and should not need to be changed. However, for very large, multimeric structures you may need to restrict the resolution range. -bigrotrange Double the angular range for the rotation function from ±5.0o (default) to ±10.0o. See Appendix A
##### BUSTER options:
 -quick Single round of BUSTER refinement for quickest results. -thorough Three rounds of BUSTER refinement. -mrefine  Comma separated list of BUSTER macros. -rigid  Perform rigid body refinement using rigid body definitions as specified. Default is to define one rigid body per chain. -noautoncs Turn off autoncs (default is ON). -target  Turn on target restraints. If specified without a pdb file, then the file specified by -xyzin is used. -sequence  Correct TNT format sequence file. Use of this option should only be considered where there are known issues with automatically generated sequence files that would require manual intervention. This option CANNOT be used with multiple model input. -l  Comma separated list of refmac-style CIF restraint dictionaries for pre-existing ligands or prosthetic groups. -abcommands "refine options" Double quote enclosed list of BUSTER command line options. See BUSTER documentation for further details. -fss "FP,SIGFP" Double quote enclosed unique F,SIGF pair. ONLY use if primary input data is an mtz file containing more than one F/SIGF pair.
##### Rhofit options:
 -rhofit  Run Rhofit if specified. Comma separated list of refmac-style CIF restraint dictionaries. -keepH Keep hydrogen atoms on the ligand in the fit. -nochirals Ignore CHIRAL restraints in fitting/output. Chiral centres can then invert as needed. -allclusters Fit the ligand to every possible binding site. -xclusters  Produce ligand fits for the best possible binding sites. Default = 1. -rhoquick Run fewer trials than usual. -rhothorough Run more trials than usual. -rhocommands "Rhofit options" Double quote enclosed list of Rhofit command line options. See Rhofit documentation for further details. -postref Post-refine the top solution from Rhofit. -postquick Quick post-refinement of the top solution from Rhofit (uses ShortRunVoid macro). -postthorough Thorough post-refinement of the top solution from Rhofit. -nooccref Do not refine ligand occupancy in post-refinement
##### General options:
 -nofreeref Acknowledgement that the reference mtz file DOES NOT contain a Free R set and that it is OK to generate one de novo. This command is COMPULSORY if a reference mtz file is not specified, or if the reference mtz file does not include a FreeR set. This is not generally recommended. -keepwater DO NOT remove waters that are present in the input model (default is to remove them). NOTE: In order for waters (or indeed ANY HETATM’s) to be retained they MUST be assigned the same chain id as the protein chain to which they are associated. -nowateradd DO NOT add/remove waters in initial autoBUSTER protocols. Use with care! -nobr Do not run buster-report. -nthreads  Number of processes to use (for both autoPROC, Phaser and BUSTER). A negative value will use (all)/n. Default = use individual program defaults. -help process|refine|rhofit Print help for either autoPROC, BUSTER or Rhofit. -macro process|refine Print list of available macros for either autoPROC or BUSTER. -v write progress of run to standard output.

## 4. Location of Pipedream output.

All of the output from Pipedream will be written in a defined directory tree in the output directory specified with the -d option.

 /process/ Location of autoPROC output. /process/truncate-unique.mtz Final output from autoPROC. Used as the input for subsequent processes. /process.out Standard output from autoPROC. /MR/ Location of limited MR output. /MR/phaser.3.pdb Final output from limited MR. Used as the input for subsequent processes. /MR/phaser.1.mtz Mtz file containing map coefficients from Phaser. /refine/ Location of BUSTER output (from final cycle). /refine/refine.(pdb,mtz) Final output from BUSTER (final cycle). Used as input for Rhofit (if run). /refine.out Standard output from BUSTER (final cycle). /rhofit-/ Location of Rhofit output for ligand . /rhofit-.out Standard output from Rhofit. /postrefine-/ Location of BUSTER post-refinement output. /postrefine-.out Standard output from BUSTER post-refinement. /report-/ Output from buster-report. Can be viewed with firefox /report-/index.html.

Where multiple models have been input, all directories and output (primarily for the limited MR and initial refinement stages) relating to the individual input models will be written into directories named <number of input pdb file>-<name of input pdb file>.

In addition, the file <root>/summary.out contains the primary summary of the results (and any warning or error messages) from each stage in the process.

A typical summary.out file (for a single pdb file input) is:

 =======================================
Processing and Refinement Summary
=======================================

Pipedream version: 1.0.0  <2014-05-12>

Run by fbloggs on bijvoet at 12:14:10 on Thu Dec  4 2014
Run from /home/fbloggs/pipedream

Command run: pipedream  -hklin 4j0p.mtz -xyzin 1w50.pdb -hklref 1w50.mtz \

All output in /home/fbloggs/pipedream/output

==================================================
************* Input data is MTZ file *************
==================================================

Checking indexing consistency against reference mtz file 1w50.mtz.

No need to reindex input data.

Copying Freer column from the reference file 1w50.mtz to the input mtz file.
Any pre-existing Freer set in the input file will be discarded.
Consistently indexed mtz file with reference Freer is in consistent-input.mtz

==================================================
******************* limited MR *******************
==================================================

Limited MR procedure run with 1 independently defined units.

MR solution found with score (TFZ) 55.6

For further details please see MR/*{rotation or translation}.out
Output pdb file: MR/phaser.3.pdb

==================================================
****** BUSTER refinement (default protocol) ******
==================================================

Initial:                R = 0.2638,     Rfree = 0.2771
After 1st refinement:   R = 0.2371,     Rfree = 0.2528
Final:                  R = 0.2103,     Rfree = 0.2409

For further details please see refine.out
Output files:
refine/refine.pdb
refine/refine.mtz

==================================================
*********** Ligand Fitting with Rhofit ***********
==================================================

++++++++++++++++++++++++++++++++++++++++++
| Running rhofit with ligand *grade-LIG* |
++++++++++++++++++++++++++++++++++++++++++

rhofit           ligand LigProt  Poorly
total   Correl  strain contact fitting
File               Chain    score   coeff    score   score   atoms
===================================================================

Hit_00_00_000.pdb   A    -2308.1   0.9171     8.9     0.0    0/26

BUSTER post-refinement
======================

Initial:        R = 0.2075,     Rfree = 0.2262
Final:          R = 0.1858,     Rfree = 0.2152

Output files:

buster-report output:

=======================================

Run took 01:04:09 h:m:s to complete

A typical summary.out file (for multiple pdb file input) is:

 =======================================
Processing and Refinement Summary
=======================================

Pipedream version: 1.0.0  <2014-05-12>

Run by fbloggs on bijvoet at 15:35:56 on Thu Jan 29 2015
Run from /home/fbloggs/pipedream

Command run: pipedream  -hklin 4ke1/4ke1.mtz -nofreeref -xyzin \
-rhothorough -postref -seqin1 seq.list -d multiple-seqin3

All output in /home/fbloggs/pipedream/multiple-seqin3

Reference structure factors (multiple-seqin3/1-1w50_nowater.mtz) have been
back-calculated from reference model with sfall!

==================================================
************* Input data is MTZ file *************
==================================================

Checking indexing consistency against reference mtz file
multiple-seqin3/1-1w50_nowater.mtz.

No need to reindex input data.

Using Freer column already present in the input mtz file.

==================================================
****************** Input models ******************
==================================================

You are running pipedream with 3 input pdb files.
Limited MR and initial refinement will be run on each
of the input models, after which the model that best
fits the data will be chosen. Further steps will only
be run on the selected model.

The input models (in order) are:

1: 1w50.pdb (located in current directory)
2: 4dh6.pdb (located in current directory)
3: 4j0p.pdb (located in current directory)

==================================================
******************* limited MR *******************
==================================================

Limited MR procedure run with 1 independently defined units.

1-1w50: MR solution found with score (TFZ) 52.5

For further details please see 1-1w50/MR/*{rotation or translation}.out
Output pdb file: 1-1w50/MR/phaser.3.pdb

2-4dh6: MR solution found with score (TFZ) 56.7

For further details please see 2-4dh6/MR/*{rotation or translation}.out
Output pdb file: 2-4dh6/MR/phaser.3.pdb

3-4j0p: MR solution found with score (TFZ) 53.3

For further details please see 3-4j0p/MR/*{rotation or translation}.out
Output pdb file: 3-4j0p/MR/phaser.3.pdb

==================================================
***************** Model selection ****************
==================================================

For the results of initial refinement and the edstats output
for each of the input models, please see:

multiple-seqin3/1-1w50/refine1.out
multiple-seqin3/1-1w50/refine1/edstats.out

multiple-seqin3/2-4dh6/refine1.out
multiple-seqin3/2-4dh6/refine1/edstats.out

multiple-seqin3/3-4j0p/refine1.out
multiple-seqin3/3-4j0p/refine1/edstats.out

The  residues, as input, that will be used to assess
which one of the input models gives the best fit to
the input data are listed in the file:

multiple-seqin3/comparison-residues.list

NOTE: Any residues from the input list that are not
present in one (or more) of the input models will be
automatically assigned a Z-score of 0 for that
particular model. Please be aware that a significant
number of "missing" residues could potentially
compromise the model selection process!

The average Z-score of the real-space sample
correlation coefficient (ZCCm) over the selected
residues for each of the input models are:

average ZCCm = 5.5000, for model multiple-seqin3/1-1w50/refine1/refine.pdb
average ZCCm = 9.1000, for model multiple-seqin3/2-4dh6/refine1/refine.pdb
average ZCCm = 5.0875, for model multiple-seqin3/3-4j0p/refine1/refine.pdb
****************************************************
On the basis of having the highest mean ZCCm score,
over the selected residue range, the model selected
as the best match to the input experimental data is

multiple-seqin3/2-4dh6/refine1/refine.pdb

refined from 4dh6.pdb

Subsequent steps will proceed using this model only!

****************************************************

==================================================
****** BUSTER refinement (default protocol) ******
==================================================

Initial:                R = 0.2706,     Rfree = 0.2961
After 1st refinement:   R = 0.2740,     Rfree = 0.3055
Final:                  R = 0.2210,     Rfree = 0.2569

For further details please see refine.out
Output files:
refine/refine.pdb
refine/refine.mtz

==================================================
*********** Ligand Fitting with Rhofit ***********
==================================================

+++++++++++++++++++++++++++++++++++++++++++++++++++++
| Running rhofit with ligand *1R6.grade_PDB_ligand* |
+++++++++++++++++++++++++++++++++++++++++++++++++++++

rhofit           ligand LigProt  Poorly
total   Correl  strain contact fitting
File               Chain    score   coeff    score   score   atoms
===================================================================

Hit_00_00_000.pdb   A    -2260.9   0.8363    28.3     0.0    0/41

BUSTER post-refinement
======================

Initial:        R = 0.2524,     Rfree = 0.2746
Final:          R = 0.1949,     Rfree = 0.2330

Output files:

buster-report output:

=======================================

Run took 01:47:02 h:m:s to complete

## 5. How to cite use of Pipedream

Sharff A, Keller P, Vonrhein C, Smart O, Womack T, Flensburg C, Paciorek C and Bricogne G (2011). Pipedream, version 1.1.1`, Global Phasing Ltd, Cambridge, United Kingdom.

autoPROC:

Vonrhein C, Flensburg C, Keller P, Sharff A, Smart O, Paciorek W, Womack T and Bricogne G. "Data processing and analysis with the autoPROC toolbox". Acta Cryst. (2011). D67, 293-303.

BUSTER:

Bricogne G, Blanc E, Brandl M, Flensburg C, Keller P, Paciorek W, Roversi P, Sharff A, Smart O, Vonrhein C, Womack T. (2011). BUSTER version X.Y.Z. Global Phasing Ltd, Cambridge, United Kingdom.

Rhofit:

Womack T, Smart O, Sharff A, Flensburg C, Keller P, Paciorek W, Vonrhein C and Bricogne G. (2011). Rhofit, version X.Y.Z. Global Phasing Ltd, Cambridge, United Kingdom.

XDS:

Kabsch W. "XDS". Acta Cryst. (2010). D66, 125-132.

CCP4:

Collaborative Computational Project, Number 4. "The CCP4 Suite: Programs for Protein Crystallography". Acta Cryst. (1994). D50, 760-763.

## 6. Appendix A: Non-isomorphism and Limited MR

A certain degree of non-isomorphism is expected and allowed for in Pipedream.

Pipedream assesses non-isomorphism in terms of the relative difference in the cell parameters (cell angle changes being referred to 1 radian) between the reference structure and the experimental data, using the following formula:

The relative difference in cell parameters is defined as $|\frac{\Delta a}{a_{exp}}| + |\frac{\Delta b}{b_{exp}}| + |\frac{\Delta c}{c_{exp}}| + |\frac{\Delta \alpha}{57.296}| + |\frac{\Delta \beta}{57.296}| + |\frac{\Delta \gamma}{57.296}|$

The larger the relative cell parameter difference, the more one might expect to have to reorient the reference structure to best match the experimental data. The limited MR procedure is configured to set the maximum angular range for the rotation function to ±5.0o. This limit has been approximately matched to the amount of reorientation that might be seen with a relative cell dimension difference of up to 0.25. With a difference in relative cell dimensions > 0.25, there is a possibility that the limited MR procedure will not be able to move the input model sufficiently and thus may fail.

If the relative difference in cell parameters exceeds 0.25, Pipedream will print a warning message in the summary.out file indicating that the limited MR procedure (and thus all subsequent steps) MAY be compromised/fail due to the degree of non-isomorphism.

If this is the case, Pipedream can be re-run with the -bigrotrange flag. This will double the angular range for the rotation function to ±10.0o, allowing for more extensive reorientation. However, further failure would indicate more extensive problems that are beyond the scope of Pipedream’s limited MR approach.

## 7. Appendix B: Revision History

1.1.0

• Released in snapshot 16th March 2015

• First release of multiple model input functionality

• Resolution range for rigid body refinement limited to 4.0Å

1.0.0

• Initial general release of Pipedream (released 4th April 2014)

0.1.4

• Released 17th November 2012.

• adaptation to allow use with "large" structures

• added phaser "refine" step to limited MR.

0.1.3

• Released 31st October 2012.

• added imagescan option for autoPROC.

• added option to input scalepack or d*TREK reflection files.

• reference mtz file now optional.

• integrated buster-report into Pipedream

• included stand-alone limited MR script, lmr.

0.1.2

• Released 23rd October 2011.

• Limited MR modified to allow individual chains/groups of chains to be moved independently.

• Added option of "Brute Force" Translation function.

• make use of openmp in phaser.

• added short post-refinement step on "top" solution from Rhofit.

0.1.1

• Initial consortium release of Pipedream (released 9th August 2011)