Copyright © 2011 - 2020 GΦL Global Phasing Limited

All rights reserved.

This software is proprietary to and embodies the confidential technology of Global Phasing Limited (GPhL). Possession, use, duplication or dissemination of the software is authorised only pursuant to a valid written licence from GPhL.

Authors: (2011 - 2020) A. Sharff, P. Keller, C. Vonrhein, O. Smart, T. Womack, C. Flensburg, W. Paciorek and G. Bricogne
Version: 1.2.5

Partial support from EU projects: SILVER (FP7-HEALTH-F3-2010-260644)

1. Pipedream.

1.1. What is Pipedream?

Pipedream is an "expert" system to link and automate [a] data processing with autoPROC, [b] a "limited" molecular replacement stage with Phaser, [c] structure refinement with BUSTER and (where requested) [d] automated ligand fitting with Rhofit with [e] subsequent BUSTER post-refinement of the top solution (again, where requested). The required input for Pipedream is an input data set, either in the form of unprocessed diffraction images or as a pre-processed mtz file, an input model and optionally, an associated mtz file. Consistent relationships between these items are expected, as detailed below.

1.2. Scope and Limitations.

Pipedream has been specifically designed as a pipeline tool to facilitate the use and integration of Global Phasing’s primary software packages (autoPROC, BUSTER and Rhofit) into a (high-throughput) fragment/ligand screening pathway. As such, its scope is quite rigidly defined and a number of limitations on its use apply.

It is anticipated that the primary use for Pipedream is where multiple data sets have been collected on a single target, differing only in the soaking/co-crystallisation conditions of the crystals. As such, it requires that the structure used as the input model for structure solution be essentially identical to that present in the crystals from which the various datasets have been collected - not only the same protein and same sequence, but CRITICALLY, the same space group and cell dimensions (allowing for small differences in the latter due to non-isomorphism, changes in environment due to soaking/co-crystallisation and freezing)!

The input model should be an appropriate APO structure. This may be a native structure or indeed a structure with a known ligand bound. Any "non-protein" molecules that are present in the binding site of interest (i.e. where you are looking for new bound ligands) MUST be removed from the input model. Other built-in ligands, such as cofactors, prosthetic groups, ions etc. - other HETATM’s - can be retained. Careful consideration should be given to any water molecules in the input model. As the chemical environment in soaked/co-crystallised crystals may well be expected to differ from the "apo" structure used as the input model, the water structure in the "apo" model may not be fully conserved. Therefore, Pipedream will automatically remove all water molecules by default, unless the explicit argument "-keepwater" is used to override this behaviour, see Optional arguments, in cases where specific water structure is known or assumed to be conserved. IMPORTANT: Conserved water molecules, together with other non-covalently bound HETATMS, in the input pdb file MUST have been assigned the chain id corresponding to the protein chain to which they are bound, otherwise they will not be retained.

Pipedream is (currently) NOT designed as a tool for automated structure solution. It cannot deal with data that require experimental phase determination, nor can it deal with cases that require full Molecular Replacement.

1.2.1. Minimum input

  1. A reference structure (pdb file) and associated reference mtz file. The latter is optional though strongly recommended. If both are given, Pipedream will confirm that the space group of the reference structure and mtz file match and that the cell dimensions are substantially the same, otherwise it will terminate immediately. Where given, the reference mtz file MUST contain a set of structure factors, ideally those from which the reference pdb structure was refined. If the reference mtz file also contains a Free R set, this will be "transferred" to the experimental data (whether input as raw images or as a pre-processed mtz file). This is considered good practice as it allows proper cross-validation of all related datasets that are refined against the input model (Pozharski et al. Acta D (2013), 69, 150-167). Where a reference mtz file is not given, Pipedream will "back calculate" structure factors from the reference model, will generate a new Free R set and will combine them into an ersatz reference mtz file for use in subsequent steps. We advise against doing this.

  2. An input data set. This can be in the form of unprocessed images, in which case the data will be automatically processed by autoPROC, the output directory of an independent autoPROC run, or a file containing already processed scaled/merged data. This can be an mtz file, a Scalepack reflection file or a d*TREK reflection file. Both Scalepack and d*TREK reflection files will be converted to mtz format before further processing. Where an autoPROC output directory or a file of pre-processed data is input, Pipedream will confirm that the space group of the input data matches the reference structure and that their cell dimensions are "similar", otherwise it will terminate. If the input data file contains structure factor amplitudes, they will be used. However, if only intensities are available, Pipedream will automatically run truncate to calculate structure factor amplitudes. Note that Pipedream will terminate if it finds more than one set of structure factor amplitudes unless a unique F/SIGF pair has been specified. Pipedream will also ensure that the input and reference mtz files are consistently indexed, reindexing the input mtz file if necessary. Note that Pipedream will only accept experimental data as unprocessed images if autoPROC is available. If not, only pre-processed data, in the form of an autoPROC output directory, an mtz file, Scalepack of d*TREK reflection file, will be accepted.

Pipedream is designed to ensure that all output structures (and maps) are in the same asymmetric unit as the model used as input to Pipedream, so that all output structures and maps from multiple runs of Pipedream (using the same reference model) are directly superimposable. For this purpose, all input data sets are examined to check that they are consistently indexed with the reference structure and that both input and reference data conform to the CCP4 definition of the asymmetric unit for the appropriate space group. Thus it is important that the reference mtz file should be directly associated with the reference pdb file. If this is not the case, the limited MR procedure may not be successful and in any case the consistency checks to ensure that the output is superimposable over the input will in effect be bypassed. This caveat also applies in the event that the same mtz file is used as both the input and reference data. Although not proscribed, this is definately not to be recommended and Pipedream will generate a warning if it detects that this is the case.

Pipedream would usually be run with all of the required input data/files specified on the command line. However, a plugin mechanism is provided to allow a user provided script to furnish any one or indeed all of the required input data/files to Pipedream, see Appendix B for full details.

1.3. Multiple models.

Conformational change in proteins is a well-known and studied phenomenon. Such changes can be extremely localised, limited to alterations in side-chain conformation, or they can be much much more extensive, such as rigid body domain movements. Localised loop movements are frequently observed in proteins, particularly in response to ligand / cofactor binding.

Such loop movements can be of a large enough magnitude that refinement alone is unable to deal with them - hence it is important to pick the correct input model for refinement. For example, if you are looking at a protein where a loop occludes the known ligand binding site in its apo state, but moves out of the site in response to the presence of a ligand, then it does not make sense to use an apo model, with the loop in the in conformation, in refinement against data where a ligand is bound. Refinement alone is unlikely to move the loop out of the binding site and subsequent ligand fitting will fail, resulting in a false negative. Conversely, it is similarly unwise to use as input to refinement a model for a conformation of the protein in a case where a ligand is bound if the experimental structure is in the apo conformation, this time resulting in a false positive.

However, in the context of running a fragment/ligand screening pathway, how do you know ahead of time whether or not the soaked ligand has bound, and therefore which input model to use with Pipedream?

Pipedream deals with this issue by allowing the input of multiple models, performing an initial refinement on all of them, after which it makes a decision as to which best matches the experimental data. It will then carry on with refinement and ligand fitting using this model alone.

1.3.1. Procedure

In order to determine which model best matches the experimental data, Pipedream looks at main-chain real-space correlation coefficients (calculated against the refined electron density maps, with the CCP4 program edstats) after initial refinement. The model that best matches the data is the one with the highest mean CC. However, this can be very insensitive when calculated over the entire structure. Therefore, to increase the sensitivity of the method, Pipedream calculates and uses the mean CC only for residues where there is significant conformational change.

The preferred method of identifying these residues is for them to be explicitly defined by the user. However, if they are left undefined, Pipedream will attempt to automatically identify regions showing conformational change by stepping through the structure and looking at pairwise RMS deviations for each residue. By default, any residue which has an RMSD of greater than 1.5Å in any of the pairwise comparisons will be selected. Note that in the eventuality that no residues are found above the defined RMSD threshold, Pipedream will select as the best match the model with the lowest Rfree after initial refinement.

1.3.2. Requirements and limitations

The method employed currently is designed to distinguish between conformational changes produced by main-chain differences, such as loop movements. Additional, limited domain movement (such as hinge-bending) can also be accommodated by running Pipedream with an appropriately constituted rigid body definition file, so that the initial refinement can correct for relative domain shifts before model comparison.

However, the current method is not sensitive to conformational change caused solely by side-chain movements.

The input model requirements listed in section 1.2 (same space group and cell as the experimental data, same protein, same sequence) apply to ALL input models. In addition, all of the input models must be directly superimposable. Furthermore, they must all share a common residue numbering and chain identification scheme.

Importantly, unmodelled residues in one or more of the input models (presumably due to disorder), are to be avoided if at all possible. Any significant number of unmodelled residues in any of the input models (unless missing from ALL of them) could potentially compromise the ability of Pipedream to select the correct model.

Pipedream deals with unmodelled regions slightly differently, depending on how they have been defined. Where Pipedream is left to determine automatically which regions to compare, it will remove from consideration any residue, regardless of pairwise RMS deviation, which does not appear in all of the input models. The potential drawback is that regions where there is genuine conformational change could be excluded from analysis if those regions are not present in one or more of the input models. Where the residues for analysis are specified by the user (the preferred method), any residues missing in one of the models would be assigned a default CC of 0 for that model. Again, a significant number of missing residues from one or more of the input models could potentially compromise the ability of Pipedream to select the correct model.

Bear in mind that intended use and scope of Pipedream implies that the ONLY difference between the input models (and their internal PDB attributes) should be in (relatively) localised conformational changes.

ALL input models must be superimposed on each other before input into Pipedream (using CCP4 program gesamt or coot) - Pipedream will not superimpose them itself.

Whilst there is no limit to the number of models that can be input, if there are a limited number of distinct conformational states, we would recommend using only the one (or two) best models representative of each conformational state. Adding more and more very similar models may simply increase the CPU time required without any improvement in precision in arriving at the correct solution.

Although Pipedream can be used to automatically identify the regions that differ (as described above), the preferred method is to tell Pipedream which residues to use for structure comparison, using either (or indeed both) the -seqin1 or -seqin2 options (see Chapter 3 for a description of the use of these options).

Ideally, analysis of the input models should be based on comparison of at least 5 residues. Although Pipedream does not enforce a minimum number of residues, if run in automatic residue determination mode, it will note in the main output if the number of residues selected above the threshold is below 4. In this case, you may want to re-run Pipedream lowering the default RMSD threshold.

1.4. Program dependencies and acknowledgements.

As well as autoPROC, BUSTER and Rhofit, Pipedream will run various CCP4 programs. In particular, Pipedream requires version 2.5.6 (or later) of Phaser. This is installed in CCP4 versions from 6.4.0. Pipedream also requires the reduce program from Molprobity.

Pipedream also incorporates buster-report, which has a number of external dependencies, such as mogul and grace. For further details please see the locally installed software installation instructions in <installation root directory>/docs/installation. Pipedream will test for the availability of these dependencies and if certain ones cannot be satisfied will not attempt to run buster-report.

We are grateful to Tassos Perrakis, Robbie Joosten and the Netherlands Cancer Institute (NKI) for permission to distribute and make use of programs pepflip and SideAide, components of the PDB_REDO suite (, in Pipedream.

You can test that all of Pipedream’s dependencies have been satisfied by running pipedream -checkdeps.

2. Pipedream architecture

Pipedream runs several packages, each generating its own output. In order to keep this output separate and clear, Pipedream will generate a specific directory structure to keep the output from each stage separated. Definition of a root directory <ROOT> in which to create this structure is obligatory.

Pipedream can be run manually. However it has been written to allow it to be called automatically and multiple runs to be run in the background or submitted on remote machines. Thus, it does not write any information to standard output, unless problems with the input files or mistakes made in invoking Pipedream prevent normal execution. All output is written to disk and may be reviewed at leisure. A summary of the main output is written into the file ROOT/summary.out.

Stage 1: Input x-ray diffraction data are processed with autoPROC (unless a pre-processed mtz file is used as the primary input). Output is written into the directory <ROOT>/process, with the standard output from autoPROC in <ROOT>/process.out.

Stage 2: The degree of non-isomorphism observed between crystals, especially after soaking experiments, can easily exceed the limit that can be corrected by rigid body refinement. Thus, Phaser is used in a specific mode to run a very "limited" molecular replacement procedure. This has the advantage that it is fast and can deal with fairly significant molecular movements due to non-isomorphism and/or conformational changes due to ligand binding. The angular range allowed for the function is matched to what is accepted as a reasonable degree of cell dimension variability - see Appendix A. Whilst this angular range can be doubled in cases of more extensive non-isomorphism, the procedure CANNOT cope with the more significant transformations seen where the search model has a different cell / symmetry to the data. By default, the input structure is treated as a single rigid unit, regardless of the number of protein chains present in the model. However, where the asymmetric unit contains multiple chains (whether homomeric or multimeric) and if so desired, individual chains or groups of chains can be defined as separate units (with the -chains option) and will be treated independently. This approach may well be be beneficial in such cases. However, one caveat that should be borne in mind is that translational NCS in the input model could lead to a possible failure mode. If the input model is known to contain chains related by translational NCS then either they should not be treated as independent units, or the brute force translation function should be selected (with the -btf keyword).

Stage 3: The structure is then refined with BUSTER, with the explicit aim of producing the best difference map for identification of bound ligands. Three different refinement protocols are available (quick, default and thorough), the choice of which is dependent on the size and degree of movement/flexibility observed in the target structure and in the quality of the input model(s). Output for the final run is written into the directory <ROOT>/refine, with the standard output from BUSTER in <ROOT>/refine.out.

The three refinement protocols are as follows:

Quick: performs a single BUSTER refinement


-RB -L -WAT 2 -autoncs

Rigid Body refinement (1st big cycle). Turn on "water" addition after big cycle 2 to look for ligand density. Turn on autoncs.

Default: run 2 rounds of BUSTER refinement


-RB -autoncs

Rigid Body refinement (1st big cycle). Turn on autoncs.


-M TLSbasic -L -WAT 3 -autoncs

Turn on TLS refinement. Turn on "water" addition after big cycle 3 to look for ligand density. Turn on autoncs.

Thorough: run 3 rounds of BUSTER refinement


-RB -autoncs_noprune

Rigid Body refinement (1st big cycle). Turn on autoncs without pruning.


-M TLSbasic -M WaterUpdatePkmaps -WAT 3 -autoncs

Turn on TLS refinement. Turn on water addition after big cycle 3. Turn on autoncs.


-TLS -L -WAT 2 -autoncs

Continue TLS refinement. Turn on "water" addition after big cycle 2 to look for ligand density. Turn on autoncs.

Which protocol to use.

Results of internal testing suggest that the default protocol should appropriate for many cases and that would certainly be our recommendation in the first instance (hence it is the default). For fairly rigid proteins that show little flexibility or variation (especially in the face of soaking / ligand binding), and where the input model has been fully characterised and refined, the quick protocol may be sufficient. For larger proteins, particularly complexes with more than one chain in the asymetric unit, or structures which show a higher degree of variability and conformational flexibility, particularly due to soaking / ligand binding, the thorough protocol may be more appropriate.

Model remediation.

Amino acid sidechains can often be seen to shift, often adopting totally different conformations, between datasets collected from different crystals. This can be a response to multiple differences between individual crystals, particularly differences in soaking with different compounds. The shifts seen in sidechains can be beyond the ability of standard refinement to correct.

SideAide, part of the PDB_REDO suite, can be run to check the modelled sidechain conformations against the electron density and refit them (if indicated) by searching all allowed rotamers to find the best fit. In addition, SideAide can rebuild sidechains that have been stubbed (please note that this is NOT the default as used in Pipedream).

Pepflip, also part of the PDB_REDO suite, can be run to check for and correct any peptide backbone flips. This is NOT run by default by Pipedream.

Model remediation (using SideAide and pepflip) can be requested in Pipedream with the -remediate keyword.

Please note that this option CANNOT be used in conjunction with the quick refinement protocol.

Where called in conjunction with the default protocol, it will be run in between the 1st and 2nd rounds of refinement.

Where called in conjunction with the thorough protocol, it will be run in between the 2nd and 3rd rounds of refinement.

After remediation, the modelcompare program (also part of the PDB_REDO suite) is run to analyse and compare the output model from SideAide with the model output from the preceeding BUSTER refinement. As well as generating a summary of the impact of running Sideaide (and pepflip) that is presented in the final summary.out, it also generates scheme and python scripts that can read into coot to aid visualisation of the impact of SideAide (and pepflip).

Multiple model input:

If run with more than one input model, Pipedream will run stage 2 and the first cycle of refinement (as defined by the specified refinement protocol in stage 3 above) for each of the input models. After automatic analysis of the conformational differences between the refined models (unless the -seqin1 and/or the -seqin2 options are specified), Pipedream will select which of the refined models gives the best fit to the data over the selected residues. Subsequent steps are only carried out with this one model. Following selection, the remaining refinement cycles (unless the quick protocol was specified) are run on the selected model.

Stage 4: If specified with one or more refmac-style CIF restraint dictionaries for soaked/co-crystallised ligands, Rhofit will be run for each specified ligand in turn to attempt to locate and fit the ligand into the refined structure. Output is written into the directory <ROOT>/rhofit-<dictionary name>. Standard output from rhofit is in <ROOT>/rhofit-<dictionary name>.out. By default, Rhofit will only attempt to fit the ligand to the single best "cluster". If you expect to see the ligand bound to more than one site (for instance the structure has two or more identical chains in the asymmetric unit) then you will need to tell rhofit how many "clusters" to identify and fit. See Optional arguments for more details.

The "top" solution from Rhofit will be automatically post-refined by a further run of BUSTER, unless Pipedream is specifically told not to with the -nopostref option. The default is to perform a full, standard BUSTER run, however, if the intention is simply to update the ligand fit and generate new maps, a short BUSTER refinement can be requested (using the -M ShortRunVoid macro), or in cases where the fit ligand is quite large and/or has several degrees of freedom or the protein structure is quite large and flexible, a more thorough post-refinement can be requested. This will run two rounds of BUSTER, although this is not generally required. By default, post-refinement will also refine the occupancy of the ligand(s) fitted by Rhofit, unless specifically told not to do so. Prior to post-refinement, hydrogen atoms will be added to the ligand (with zero occupancy if the resolution is lower than 2.0Å or with full occupancy if the resolution is higher than 2.0Å). Output from this run will be written to <ROOT>/postrefine-<dictionary name> and with standard output written to <ROOT>/postrefine-<dictionary name>.out.

Note: The implementation of Rhofit in Pipedream allows fitting of a single ligand or, where a crystal has been soaked in a cocktail of compounds, fitting each component independently to allow the user to determine which, if any, component has bound, i.e. to answer the question "Does compound A or B or … etc bind?". It CANNOT be used to successively fit multiple compounds into a structure, i.e. to answer the question "Do compounds A and B and … etc all bind?".

buster-report will also be run (unless its dependencies are not satisfied) to give a concise report on the outcome of refinement. If both Rhofit and subsequent post-refinement have been requested, buster-report will be run on the output of post-refinement. If not, it will be run on the final output of the initial refinement protocol. The output from buster-report will be written to <ROOT>/report.

3. How to run Pipedream?

  • To invoke Pipedream, simply use the command:

% pipedream <options>

  • A basic invocation of Pipedream would look something like:

% pipedream -imagedir <directory> -d <output directory> -xyzin input.pdb -hklref input.mtz

3.1. Details of command-line arguments

no argument or -help

Quick help message listing most important arguments.

-help process

Quick help message listing most important autoPROC arguments.

-help refine

Quick help message listing most important BUSTER arguments.

-help rhofit

Quick help message listing most important Rhofit arguments.

3.1.1. Minimum required arguments

-imagedir [directory name]

Directory containing the raw images. This directory should contain the images for a single dataset only. Pipedream can cope with datasets that have been collected in multiple scans (for example high and low resolution passes or scans collected with multiple orientations on a kappa/Eulerian goniostat), provided adequate information relating these scans is provided in the image headers (and, for multi-axis goniometers, in a local configuration file for the relevant beamline).

- or -

-imagescan <scan definition>

Use this option to input a specified set of images. The scan definition is the same form as the -Id option in autoPROC, i.e. <idN>,<dirN>,<templateN>,<fromN>,<toN>. To find sets of images in a particular directory and output scans in the correct format, you can run the command find_images -l -d <dir>. Multiple scans can be input by multiple invocations of -imagescan. Please note though that multiple scans MUST be images collected at the same wavelength. Pipedream CANNOT deal with images collected at multiple wavelengths.

- or -

-h5master <dir/master.h5>

Use this option if the input data are a set of Eiger HDF5 files. Give the FULL path and name of the <template>_master.h5 file.

- or -

-autoprocdir [directory name]

output directory from a previous run of autoPROC. Pipedream will read the appropriate output mtz file from the autoPROC output directory as well reporting on the processing statistics.

- or -

-hklin filename.mtz/sca/ref

Input scaled & merged mtz/scalepack/d*TREK file. Scalepack or d*TREK reflection files will be automatically converted to mtz format. Pipedream CANNOT accept unscaled/unmerged data. If the input file does not contain structure factor amplitudes, truncate will be run automatically. The data will also be automatically reindexed (if required) to ensure that it is consistently indexed with the reference mtz file. A Free R flag will also be added if one is not present and the -nofreeref option is also specified. If more than one set of structure factor amplitudes are present, Pipedream will terminate rather than make an arbitrary decision on which amplitudes to use, unless a unique F/SIGF pair is specified (see below).

-d [directory name]

Output directory. All pipedream output will be written in a defined tree under this directory. Specifying an output directory is COMPULSORY to ensure that the output from a run of Pipedream is kept separate from any other, and enables the output to be separated from the input data, which may in any case be desirable as part of the data management policy in your research group.

-xyzin <pdbinputs>

Input pdb file(s). Enter as a comma separated list if more than one input structure is specified. These structures should be of the same target protein as the input data and they are ALL expected to have the same cell and space group! If more than one model is input, they must all be superimposable! IMPORTANT: These structures should be APO structures. They should NOT contain any ligands in the binding site(s) of interest (where you are looking for bound ligands)! However, they should contain any associated co-factors that are not expected to be affected by the soaking of the putative ligand.

-hklref filename.mtz

OPTIONAL (but strongly recommended) Reference MTZ file. This file should go together with the reference pdb file (where multiple pdb files are specified, the reference mtz file should go with the first input model). It MUST contain a set of structure factors and also the Free R set that was used in refining the input reference structure. If it does NOT contain a Free R set, Pipedream will terminate unless the -nofreeref option is also specified, in which case it will generate a new Free R set. If a reference mtz file is NOT specified, Pipedream will "back calculate" structure factors from the reference structure together with generation of a new Free R set and use these as the reference set (where multiple input models have been input, structure factors will be calculated from the first input model). In this case, as a reference Free R set is clearly not available for use, specifying the -nofreeref option is compulsory.

Note: If autoPROC is not installed, the -imagedir and -imagescan options will be disabled and only the -hklin option will be available.

The expected cell dimensions and space group will be read directly from the reference mtz file and autoPROC will ensure (where the symmetry allows the possibility of alternate indexing) that the experimental data are indexed consistently with the reference. Given that the reference pdb and mtz files are paired, this ensures that the limited molecular replacement should be successful, and has the added advantage that where pipedream is run on a series of structures, they will all end up in the same asymmetric unit and will therefore be directly superimposable. If the reference mtz file contains a Free R set, this set will be used for the processed data. Thus all data sets processed with Pipedream using the same input mtz file will have a common Free R set. As previously described, use of a common Free R set is good practice, in this context, to allow for proper cross-validation between structures.

3.1.2. Optional arguments

Further optional arguments are grouped into options for autoPROC, BUSTER and Rhofit (see pipedream -help).

Multiple model input options:

-seqin1 <seqin.dat>

File containing comma-separated list of the residues to be used for structure difference analysis (in the form <residue name> <chain id> <residue number>, i.e. GLY A 34,ALA A 35,THR A 96)

- and/or -

-seqin2 "residue list"

Double-quote enclosed, comma-separated list of the residues to used for structure difference analysis (in the form <residue name> <chain id> <residue number>, i.e. GLY A 34,ALA A 35,THR A 96). If this is used together with the -seqin1 option, a combined list of residues listed through both options will be used.

-rmsd <number>

Threshold value for pairwise RMS deviation bewtween residues to be selected for analysis (default = 1.5Å). This option can only be used if neither -seqin1 or -seqin2 options are specified.

autoPROC options (only where autoPROC is installed):

-cell <"a b c al be ga">

Cell dimensions. This will override the cell read from the reference mtz file. Not generally recommended.

-mproc <macro name>

Comma-separated list of autoPROC macros.

-kappa <site name>

Specify site for use of kappa/eulerian goniometer. Use without an argument to list available sites.

-beam "x y"

Specify direct beam position (in double quotes). Default is to use direct beam position as specified in the image header.

-beamtransform <option>

Double-quote enclosed x and y transformation of direct beam (from header). Possibilities are: x,y x,-y -x,y -x,-y y,x y,-x -y,x -y,-x


Test all 8 transformation possibilities of direct beam position.


Try to determine and refine direct beam position automatically (use with caution!!).

-apcommands "process options"

Double-quote enclosed list of autoPROC command line options. See autoPROC documentation for further details.


Use the anisotropically scaled output file (staraniso_alldata-unique.mtz) output by Staraniso in place of the usual, isotropically scaled autoPROC output file (truncate-unique.mtz) in all subsequent steps.

Data acceptance criteria:

The primary goal of Pipedream to automate processing and structure refinement for ligand detection. We consider that there are certain minimum criteria that the data need to meet to make looking for ligands, particularly small ligands, viable.

Where raw images have been input, the data are checked against these criteria and if any of these checks fail, then Pipedream will terminate cleanly. The current criteria are based on resolution, completeness and Rpim. The default values of these can be changed with the following options:

-rmin <number>

Minimum acceptable high resolution limit (default = 3.5Å).

-rpim <number>

Maximum acceptable rpim (default = 25%).

-completeness <number>

Minimum acceptable overall data completeness (default = 60%).

Optional "Molecular Replacement" arguments:

-chains "chain list"

Double-quote enclosed, space-separated list of individual chains/multimers to move independently. For example -chains "A B C D" will move chains independently, whereas -chains "AB CD" will treat chains A & B as a single rotatable unit and chains C & D as another single rotatable unit. By default, if this option is not specified, Pipedream will treat the entire input model as a single unit. Note: all hetero-atoms MUST have the same chain id as their associated protein molecule or they will be lost!


Use brute force translation function. Default is to use fast translation function. The fast translation function is the faster protocol, however, if the input model has translational NCS, you may get better results from the brute force translation function. This option should only be used in combination with the -chains option.

-mrres [<reslow>] <reshigh>

Resolution limits for MR. Most of the time, the defaults (low res limit left unset and high res limit set to 3.0Å) are adequate and should not need to be changed. However, for very large, multimeric structures you may need to restrict the resolution range.


Double the angular range for the rotation function from ±5.0o (default) to ±10.0o. See Appendix A

BUSTER options:


Single round of BUSTER refinement for quickest results.


Three rounds of BUSTER refinement.

-mrefine <macro name>

Comma separated list of BUSTER macros.

-rigid <rigid.dat>

Perform rigid body refinement using rigid body definitions as specified. Default is to define one rigid body per chain.


Turn off autoncs (default is ON).

-target <filename.pdb>

Turn on target restraints. If specified without a pdb file, then the file specified by -xyzin is used.

-sequence <TNT sequence file>

Correct TNT format sequence file. Use of this option should only be considered where there are known issues with automatically generated sequence files that would require manual intervention. This option CANNOT be used with multiple model input.

-l <dictionaries>

Comma separated list of refmac-style CIF restraint dictionaries for pre-existing ligands or prosthetic groups.

-abcommands "refine options"

Double quote enclosed list of BUSTER command line options. See BUSTER documentation for further details.

-fss "FP,SIGFP"

Double quote enclosed unique F,SIGF pair. ONLY use if primary input data is an mtz file containing more than one F/SIGF pair.

Remediation (PDB_REDO) options:


Run SideAide to refit side chains. This option cannot be used in conjunction with the quick refinement option.


Also allow SideAide to rebuild stubbed sidechains.


Also run pepflip to check for and correct peptide bond flips.

Rhofit options:

-rhofit <dictionaries>

Run Rhofit if specified. Comma separated list of refmac-style CIF restraint dictionaries.


Keep hydrogen atoms on the ligand in the fit.


Ignore CHIRAL restraints in fitting/output. Chiral centres can then invert as needed.


Fit the ligand to every possible binding site.

-xclusters <n>

Produce ligand fits for the <n> best possible binding sites. Default = 1.


Run fewer trials than usual.


Run more trials than usual.

-rhocommands "Rhofit options"

Double quote enclosed list of Rhofit command line options. See Rhofit documentation for further details.


Post-refine the top solution from Rhofit (default option if -rhofit is defined).


Quick post-refinement of the top solution from Rhofit (uses ShortRunVoid macro).


Thorough post-refinement of the top solution from Rhofit.


Do not run any post-refinement but terminate after Rhofit.


Do not refine ligand occupancy in post-refinement

Data Input Options:

-plugin "<identifier>"

Run defined plugin program with argument "<identifier>" to retrieve and furnish details of one or more of the required input data sources to Pipedream. Please see Appendix B for a full description of the set-up and operation of the Pipedream data plugin mechanism. Note that any of the mandatory data inputs not provided through this call must be specified individually on the command line. The argument <identifier> should be specified inside double quotes.

General options:


Acknowledgement that the reference mtz file DOES NOT contain a Free R set and that it is OK to generate one de novo. This command is COMPULSORY if a reference mtz file is not specified, or if the reference mtz file does not include a FreeR set. This is not generally recommended.


DO NOT remove waters that are present in the input model (default is to remove them). NOTE: In order for waters (or indeed ANY HETATM’s) to be retained they MUST be assigned the same chain id as the protein chain to which they are associated.


DO NOT add/remove waters in initial BUSTER protocols. Use with care!


Do not run buster-report.

-nthreads <integer>

Number of processes to use (for both autoPROC, Phaser and BUSTER). A negative value will use (all)/n.

Default = use individual program defaults.

-help process|refine|rhofit

Print help for either autoPROC, BUSTER or Rhofit.

-macro process|refine

Print list of available macros for either autoPROC or BUSTER.


write progress of run to standard output.

4. Location of Pipedream output.

All of the output from Pipedream will be written in a defined directory tree in the output directory specified with the -d option.


Location of autoPROC output.


Final output from autoPROC. Used as the input for subsequent processes.


Standard output from autoPROC.


Location of limited MR output.


Final output from limited MR. Used as the input for subsequent processes.


Mtz file containing map coefficients from Phaser.


Location of BUSTER output (from final cycle).


Final output from BUSTER (final cycle). Used as input for Rhofit (if run).


Standard output from BUSTER (final cycle).

<root>/rhofit-<dictionary name>/

Location of Rhofit output for ligand <dictionary name>.

<root>/rhofit-<dictionary name>.out

Standard output from Rhofit.

<root>/postrefine-<dictionary name>/

Location of BUSTER post-refinement output.

<root>/postrefine-<dictionary name>.out

Standard output from BUSTER post-refinement.

<root>/report-<dictionary name>/

Output from buster-report. Can be viewed with firefox <root>/report-<dictionary name>/index.html.

Where multiple models have been input, all directories and output (primarily for the limited MR and initial refinement stages) relating to the individual input models will be written into directories named <number of input pdb file>-<name of input pdb file>.

In addition, the file <root>/summary.out contains the primary summary of the results (and any warning or error messages) from each stage in the process.

A typical summary.out file (for a single pdb file input) is:

    Processing and Refinement Summary

 Pipedream version: 1.0.0  <2014-05-12>

 Run by fbloggs on bijvoet at 12:14:10 on Thu Dec  4 2014
 Run from /home/fbloggs/pipedream

 Command run: pipedream  -hklin 4j0p.mtz -xyzin 1w50.pdb -hklref 1w50.mtz \
              -rhofit grade-LIG.cif -postref -d output

 All output in /home/fbloggs/pipedream/output

 ************* Input data is MTZ file *************

 Checking indexing consistency against reference mtz file 1w50.mtz.

 No need to reindex input data.

 Copying Freer column from the reference file 1w50.mtz to the input mtz file.
 Any pre-existing Freer set in the input file will be discarded.
 Consistently indexed mtz file with reference Freer is in consistent-input.mtz

 ******************* limited MR *******************

 Limited MR procedure run with 1 independently defined units.

 MR solution found with score (TFZ) 55.6

 For further details please see MR/*{rotation or translation}.out
 Output pdb file: MR/phaser.3.pdb

 ****** BUSTER refinement (default protocol) ******

 Initial:                R = 0.2638,     Rfree = 0.2771
 After 1st refinement:   R = 0.2371,     Rfree = 0.2528
 Final:                  R = 0.2103,     Rfree = 0.2409

 For further details please see refine.out
 Output files:

 *********** Ligand Fitting with Rhofit ***********

 | Running rhofit with ligand *grade-LIG* |

 For output and further details please see rhofit-grade-LIG/

                             rhofit           ligand LigProt  Poorly
                              total   Correl  strain contact fitting
  File               Chain    score   coeff    score   score   atoms

   Hit_00_00_000.pdb   A    -2308.1   0.9171     8.9     0.0    0/26

 BUSTER post-refinement

 Initial:        R = 0.2075,     Rfree = 0.2262
 Final:          R = 0.1858,     Rfree = 0.2152

 For further details please see postrefine-grade-LIG.out
 Output files:

 buster-report output:


 Run took 01:04:09 h:m:s to complete

A typical summary.out file (for multiple pdb file input) is:

    Processing and Refinement Summary

 Pipedream version: 1.0.0  <2014-05-12>

 Run by fbloggs on bijvoet at 15:35:56 on Thu Jan 29 2015
 Run from /home/fbloggs/pipedream

 Command run: pipedream  -hklin 4ke1/4ke1.mtz -nofreeref -xyzin \
              1w50.pdb,4dh6.pdb,4j0p.pdb -rhofit 1R6.grade_PDB_ligand.cif \
              -rhothorough -postref -seqin1 seq.list -d multiple-seqin3

 All output in /home/fbloggs/pipedream/multiple-seqin3

 Reference structure factors (multiple-seqin3/1-1w50_nowater.mtz) have been
 back-calculated from reference model with sfall!

 ************* Input data is MTZ file *************

 Checking indexing consistency against reference mtz file

 No need to reindex input data.

 Using Freer column already present in the input mtz file.

 ****************** Input models ******************

 You are running pipedream with 3 input pdb files.
 Limited MR and initial refinement will be run on each
 of the input models, after which the model that best
 fits the data will be chosen. Further steps will only
 be run on the selected model.

 The input models (in order) are:

 1: 1w50.pdb (located in current directory)
 2: 4dh6.pdb (located in current directory)
 3: 4j0p.pdb (located in current directory)

 ******************* limited MR *******************

 Limited MR procedure run with 1 independently defined units.

 1-1w50: MR solution found with score (TFZ) 52.5

        For further details please see 1-1w50/MR/*{rotation or translation}.out
        Output pdb file: 1-1w50/MR/phaser.3.pdb

 2-4dh6: MR solution found with score (TFZ) 56.7

        For further details please see 2-4dh6/MR/*{rotation or translation}.out
        Output pdb file: 2-4dh6/MR/phaser.3.pdb

 3-4j0p: MR solution found with score (TFZ) 53.3

        For further details please see 3-4j0p/MR/*{rotation or translation}.out
        Output pdb file: 3-4j0p/MR/phaser.3.pdb

 ***************** Model selection ****************

 For the results of initial refinement and the edstats output
 for each of the input models, please see:




 The  residues, as input, that will be used to assess
 which one of the input models gives the best fit to
 the input data are listed in the file:


 NOTE: Any residues from the input list that are not
 present in one (or more) of the input models will be
 automatically assigned a Z-score of 0 for that
 particular model. Please be aware that a significant
 number of "missing" residues could potentially
 compromise the model selection process!

 The average Z-score of the real-space sample
 correlation coefficient (ZCCm) over the selected
 residues for each of the input models are:

 average ZCCm = 5.5000, for model multiple-seqin3/1-1w50/refine1/refine.pdb
 average ZCCm = 9.1000, for model multiple-seqin3/2-4dh6/refine1/refine.pdb
 average ZCCm = 5.0875, for model multiple-seqin3/3-4j0p/refine1/refine.pdb
 On the basis of having the highest mean ZCCm score,
 over the selected residue range, the model selected
 as the best match to the input experimental data is


 refined from 4dh6.pdb

 Subsequent steps will proceed using this model only!


 ****** BUSTER refinement (default protocol) ******

 Initial:                R = 0.2706,     Rfree = 0.2961
 After 1st refinement:   R = 0.2740,     Rfree = 0.3055
 Final:                  R = 0.2210,     Rfree = 0.2569

 For further details please see refine.out
 Output files:

 *********** Ligand Fitting with Rhofit ***********

 | Running rhofit with ligand *1R6.grade_PDB_ligand* |

 For output and further details please see rhofit-1R6.grade_PDB_ligand/

                             rhofit           ligand LigProt  Poorly
                              total   Correl  strain contact fitting
  File               Chain    score   coeff    score   score   atoms

   Hit_00_00_000.pdb   A    -2260.9   0.8363    28.3     0.0    0/41

 BUSTER post-refinement

 Initial:        R = 0.2524,     Rfree = 0.2746
 Final:          R = 0.1949,     Rfree = 0.2330

 For further details please see postrefine-1R6.grade_PDB_ligand.out
 Output files:

 buster-report output:


 Run took 01:47:02 h:m:s to complete

5. How to cite use of Pipedream

Sharff A, Keller P, Vonrhein C, Smart O, Womack T, Flensburg C, Paciorek C and Bricogne G (2011). Pipedream, version 1.2.5, Global Phasing Ltd, Cambridge, United Kingdom.


Vonrhein C, Flensburg C, Keller P, Sharff A, Smart O, Paciorek W, Womack T and Bricogne G. "Data processing and analysis with the autoPROC toolbox". Acta Cryst. (2011). D67, 293-303.


Bricogne G, Blanc E, Brandl M, Flensburg C, Keller P, Paciorek W, Roversi P, Sharff A, Smart O, Vonrhein C, Womack T. (2011). BUSTER version X.Y.Z. Global Phasing Ltd, Cambridge, United Kingdom.


Womack T, Smart O, Sharff A, Flensburg C, Keller P, Paciorek W, Vonrhein C and Bricogne G. (2011). Rhofit, version X.Y.Z. Global Phasing Ltd, Cambridge, United Kingdom.


Joosten RP, Joosten K, Cohen SX, Vriend G, and Perrakis A. (2011). Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank. Bioinformatics. 27. 3392-3398.


Kabsch W. "XDS". Acta Cryst. (2010). D66, 125-132.


Collaborative Computational Project, Number 4. "The CCP4 Suite: Programs for Protein Crystallography". Acta Cryst. (1994). D50, 760-763.

6. Appendix A: Non-isomorphism and Limited MR

A certain degree of non-isomorphism is expected and allowed for in Pipedream.

Pipedream assesses non-isomorphism in terms of the relative difference in the cell parameters (cell angle changes being referred to 1 radian) between the reference structure and the experimental data, using the following formula:

The relative difference in cell parameters is defined as $|\frac{\Delta a}{a_{exp}}| + |\frac{\Delta b}{b_{exp}}| + |\frac{\Delta c}{c_{exp}}| + |\frac{\Delta \alpha}{57.296}| + |\frac{\Delta \beta}{57.296}| + |\frac{\Delta \gamma}{57.296}|$

The larger the relative cell parameter difference, the more one might expect to have to reorient the reference structure to best match the experimental data. The limited MR procedure is configured to set the maximum angular range for the rotation function to ±5.0o. This limit has been approximately matched to the amount of reorientation that might be seen with a relative cell dimension difference of up to 0.25. With a difference in relative cell dimensions > 0.25, there is a possibility that the limited MR procedure will not be able to move the input model sufficiently and thus may fail.

If the relative difference in cell parameters exceeds 0.25, Pipedream will print a warning message in the summary.out file indicating that the limited MR procedure (and thus all subsequent steps) MAY be compromised/fail due to the degree of non-isomorphism.

If this is the case, Pipedream can be re-run with the -bigrotrange flag. This will double the angular range for the rotation function to ±10.0o, allowing for more extensive reorientation. However, further failure would indicate more extensive problems that are beyond the scope of Pipedream’s limited MR approach.

7. Appendix B: Pipedream Data Input Plugin

The plugin mechanism has been implemented to allow the user to query a database(s) / other source(s) to automatically provide Pipedream with both the identity and location of any or all of the various data sources (x-ray images or pre-processed mtz file, reference mtz file, input model(s), ligand restraint dictionary/dictionaries) required.

7.1. Use of plugin

In order to access the plugin functionality, you will need to provide a script/binary, which should run as:

pluginscript "<identifier>"

where <identifier> is a string (or strings), possibly one or more unique database identifiers, that the script will interpret and act upon.

The internals of this script/binary (what it does based on the input) is entirely up to the user to decide, however, the required output from this script/binary are the identity and location of the required Pipedream input data, provided in JSON format, as the ONLY information output by the script/binary to standard output.

The file permissions on this script/binary must be set to ensure that it is executable.

In order to configure Pipedream to to see and run this executable, the environment variable BDG_TOOL_PIPEDREAM_PLUGIN must be defined to point to the script / binary.

For example:

setenv BDG_TOOL_PIPEDREAM_PLUGIN /software/local/bin/pluginscript


export BDG_TOOL_PIPEDREAM_PLUGIN=/software/local/bin/pluginscript

We would strongly recommend adding these to the \$BDG_home/setup_local.csh and \$BDG_home/ files respectively.

Pipedream can be run to invoke this mechanism with the -plugin command line option, e.g.

pipedream -plugin “<identifier>”

Please note that the argument to the -plugin option should be specified inside double quotes.

The actual command that Pipedream will execute is


The location and identity of any one or indeed all of the required data inputs to Pipedream can be provided in this manner.

Any of the required inputs that are not returned through this mechanism can (and must) be specified as usual on the Pipedream command line, for example:

pipedream -plugin 12345 -xyzin input.pdb -rhofit ligand.cif

Please note that command line options take precedence over information returned by the plugin mechanism, irrespective of order on the command line and thus can be used to override any information returned by the plugin. For instance if Pipedream is run as above and the plugin returns the input model, the use of the -xyzin flag will tell Pipedream to use input.pdb in place of any input model returned by the plugin.

7.2. Required output from the plugin script/binary

The required structure/format of the JSON output from the plugin script/binary is as follows:

 "PipedreamInput": {
  "PipedreamExperimentalData": {
  "PipedreamModelData": {
    "PipedreamInputPDB": "/data/input/input.pdb",
    "PipedreamReferenceMTZ": "/data/input/reference.mtz",
    "PipedreamInputRestraints": "/data/input/cofactor.cif",
    "PipedreamRhofitRestraints": "/data/input/ligand.cif"

The output contains a number of defined, nested JSON objects.

The primary object should be “PipedreamInput”. Nested below this, there are two secondary objects, “PipedreamExperimentalData”, which contains information pertaining to the experimental data (raw x-ray images or pre-processed mtz file) required, and “PipedreamModelData”, which contains information pertaining to the input model(s) and restraint dictionaries required.

If populated, the object “PipedreamExperimentalData” should contain a single name / value pair, of the form “INPUTDATATYPE : INPUTDATASPEC”, which must follow one of the following patterns:

a) “PipedreamImagedir” : “<INPUTDATASPEC>”

where INPUTDATASPEC shows the full path to the directory containing the raw x-ray images (the equivalent of the -imagedir option in Pipedream). For example

“PipedreamImagedir” : “/data/input/lyso-123”

b) “PipedreamImageScan” : “<INPUTDATASPEC>”

where INPUTDATASPEC shows a full autoPROC scan definition to define the location and specific image scan/ranges (the equivalent of the -imagescan option in Pipedream). For example

“PipedreamImageScan” : “lyso-123,/data/input/lyso-123,lyso-123_1_###.img,1,180”

c) “PipedreamH5Master” : “<INPUTDATASPEC>”

where INPUTDATASPEC shows the full path to a master input file for Eiger H5 data (the equivalent of the -h5master option in Pipedream). For example

“PipedreamH5Master” : “/data/input/lyso-123.master”

d) “PipedreamHklin” : “<INPUTDATASPEC>”

where INPUTDATASPEC shows the full path to and name of a pre-processed mtz file (the equivalent of the -hklin option in Pipedream). For example

“PipedreamHklin”: “/data/input/lyso-123.mtz”

ONLY ONE of the above name / value pairs may be defined, otherwise Pipedream will terminate.

If populated, the “PipedreamModelData” object may contain any combination of the “PipedreamInputPDB”, “PipedreamReferenceMTZ”, "PipedreamInputRestraints" and “PipedreamRhofitRestraints” name / value pairs. These are the equivalent of the -xyzin, -hklref, -l and -rhofit options in Pipedream.

8. Appendix C: Revision History



  • Released 14th May 2018

  • Introduction of plugin mechanism for data input


  • Released 27th November 2017

  • First incorporation of use of PDB_REDO programs


  • Released 8th May 2017

  • More comprehensive Staraniso output use

  • ability to input Eiger .h5 files

  • Multiple minor fixes / improvements


  • Released 24th February 2016

  • First release to allow use of Staraniso output data


  • Released in snapshot 16th March 2015

  • First release of multiple model input functionality

  • Resolution range for rigid body refinement limited to 4.0Å


  • Initial general release of Pipedream (released 4th April 2014)


  • Released 17th November 2012.

  • adaptation to allow use with "large" structures

  • added phaser "refine" step to limited MR.

  • added checkdeps functionality.


  • Released 31st October 2012.

  • added imagescan option for autoPROC.

  • added option to input scalepack or d*TREK reflection files.

  • reference mtz file now optional.

  • integrated buster-report into Pipedream

  • included stand-alone limited MR script, lmr.


  • Released 23rd October 2011.

  • Limited MR modified to allow individual chains/groups of chains to be moved independently.

  • Added option of "Brute Force" Translation function.

  • make use of openmp in phaser.

  • added short post-refinement step on "top" solution from Rhofit.


  • Initial consortium release of Pipedream (released 9th August 2011)