Determining if a Protein Model Contains a Backbone Clash

Determining if a Protein Model Contains a Backbone Clash

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have an ensemble of homology models of a protein, and I now wish to remove those models which have backbone clashes. I could obviously check by eye but this is subjective and probably will not be accepted for publication.

What is the best (reproducible) method to determine if a particular protein model contains a backbone clash?

The way to check for steric clashing between any two atoms, backbone or otherwise, is to compute their Euclidean distance. Ifaandbrepresent two atoms (witha_xbeing the X coordinate of atomaand so forth), you can calculate their Euclidean distance as follows.

d(a, b) = sqrt( (a_x - b_x)^2 + (a_y - b_y)^2 + (a_z - b_z)^2) )

So essentially the idea would be to calculate the pairwise distance between each of the backbone atoms. For any pair of atoms, there is steric clashing if the distance between them falls below a certain threshold. If I remember correctly, this threshold is the sum of the van der Waals radii of the two atoms.

Determining if a Protein Model Contains a Backbone Clash - Biology

This tutorial includes binding site analysis and comparison of related structures by superposition and morphing. Internet connectivity is required to fetch the structures 3w7f and 2zco.

&larr Background and Setup

The pathogenic organism Staphylococcus aureus makes a pigment called staphyloxanthin. The pigment imparts a golden color (hence aureus), but more importantly, contributes to virulence by protecting the bacteria from being killed by the host immune system. The S. aureus enzyme CrtM may be a good drug target because it catalyzes a key step in staphyloxanthin synthesis:

Start Chimera by clicking or doubleclicking the Chimera icon (depending on its location). Typically, this icon will be present on the desktop. The Chimera executable can also be run from its installation location (details. ).

A splash screen will appear, to be replaced in a few seconds by the main Chimera graphics window or Rapid Access interface (it does not matter which, the following instructions will work with either). If you like, resize the Chimera window by dragging its lower right corner.

  • Side View (under Viewing Controls)
  • FindHBond (Structure Analysis)
  • Find Clashes/Contacts (Structure Analysis)
  • Rotamers (Structure Editing)
  • Reply Log (Utilities, the last section)

Fetch a structure from the Protein Data Bank and use the ribbons preset:

active site
This preset displays the two protein chains (A and B) as red and blue ribbons bound molecules and nearby sidechains are shown as sticks. The two chains are two copies of the enzyme. Delete chain B:

The enzyme combines two 15-carbon molecules of farnesyl pyrophosphate to form a 30-carbon lipid. This structure contains farnesyl thiopyrophosphate, which differs from the substrate by having a sulfur in the place of one oxygen. Sulfur atoms are shown in yellow, phosphorus orange, oxygen red, and nitrogen blue. Label the ligand residues:

Delete the water, label residues with displayed sidechains, and place the labels near &alpha-carbons instead of residue centroids:

&larr Distances, H-Bonds, Contacts

It looks like several sidechains could be donating hydrogen bonds to phosphate oxygens. (Although the structure does not include hydrogens, we know they are there!)

  1. Ctrl-click to pick the sidechain oxygen of Ser 21
  2. Shift-Ctrl-doubleclick on the nearest phosphate oxygen
  3. click Show Distance in the resulting context menu
  1. Select the FPS residues (for example, with Select. Residue. FPS in the menu).
  2. Start FindHBond by clicking its icon:
  3. In that dialog, turn on the options:
    • Only find H-bonds with at least one end selected
    • Write information to reply log
  4. Set Line width to 3.
  5. Click OK.
  • clashes - unfavorable interactions where atoms are too close together close contacts
  • contacts - all kinds of direct interactions: polar and nonpolar, favorable and unfavorable (including clashes)
  1. Start Find Clashes/Contacts by clicking its icon:
  2. With the FPS residues still selected, click Designate. Now, 48 atoms should be designated for checking against all other atoms.
  3. Set the Clash/Contact Parameters to the default contact criteria (by clicking the button marked contact).
  4. Set the Treatment of Clash/Contact Atoms to:
    • Select
    • If endpoint atom hidden, show endpoint residue
    • Write information to reply log
    • (turn off any other treatment options)
  5. Click Apply.

One might simply want a list of the interacting residues rather than the details of each atomic contact. A list of the selected residues can be saved. First, deselect the ligand residues and ions, leaving the protein residues selected:

sel ligand

Open the Selection Inspector by clicking the magnifying glass icon near the bottom right corner of the main window. It reports that 26 residues are selected. Click its Write List. button (or choose Actions. Write List from the main menu). In the resulting dialog, indicate that selected residues should be written. Click Log to write the list to the Reply Log instead of to a file.

&larr Angles, Rotamers, Clashes

Amino acid torsion angle values can be viewed in the Selection Inspector. Focus on Tyr 248:

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.

Secondary Structure

A protein’s secondary structure is whatever regular structures arise from interactions between neighboring or near-by amino acids as the polypeptide starts to fold into its functional three-dimensional form. Secondary structures arise as H bonds form between local groups of amino acids in a region of the polypeptide chain. Rarely does a single secondary structure extend throughout the polypeptide chain. It is usually just in a section of the chain. The most common forms of secondary structure are the α-helix and β-pleated sheet structures and they play an important structural role in most globular and fibrous proteins.

Secondary structureThe α-helix and β-pleated sheet form because of hydrogen bonding between carbonyl and amino groups in the peptide backbone. Certain amino acids have a propensity to form an α-helix, while others have a propensity to form a β-pleated sheet.

In the α-helix chain, the hydrogen bond forms between the oxygen atom in the polypeptide backbone carbonyl group in one amino acid and the hydrogen atom in the polypeptide backbone amino group of another amino acid that is four amino acids farther along the chain. This holds the stretch of amino acids in a right-handed coil. Every helical turn in an alpha helix has 3.6 amino acid residues. The R groups (the side chains) of the polypeptide protrude out from the α-helix chain and are not involved in the H bonds that maintain the α-helix structure.

In β-pleated sheets, stretches of amino acids are held in an almost fully-extended conformation that “pleats” or zig-zags due to the non-linear nature of single C-C and C-N covalent bonds. β-pleated sheets never occur alone. They have to held in place by other β-pleated sheets. The stretches of amino acids in β-pleated sheets are held in their pleated sheet structure because hydrogen bonds form between the oxygen atom in a polypeptide backbone carbonyl group of one β-pleated sheet and the hydrogen atom in a polypeptide backbone amino group of another β-pleated sheet. The β-pleated sheets which hold each other together align parallel or antiparallel to each other. The R groups of the amino acids in a β-pleated sheet point out perpendicular to the hydrogen bonds holding the β-pleated sheets together, and are not involved in maintaining the β-pleated sheet structure.


Resolution is a measure of the quality of the data that has been collected on the crystal containing the protein or nucleic acid. If all of the proteins in the crystal are aligned in an identical way, forming a very perfect crystal, then all of the proteins will scatter X-rays the same way, and the diffraction pattern will show the fine details of crystal. On the other hand, if the proteins in the crystal are all slightly different, due to local flexibility or motion, the diffraction pattern will not contain as much fine information. So resolution is a measure of the level of detail present in the diffraction pattern and the level of detail that will be seen when the electron density map is calculated. High-resolution structures, with resolution values of 1 Å or so, are highly ordered and it is easy to see every atom in the electron density map. Lower resolution structures, with resolution of 3 Å or higher, show only the basic contours of the protein chain, and the atomic structure must be inferred. Most crystallographic-defined structures of proteins fall in between these two extremes. As a general rule of thumb, we have more confidence in the location of atoms in structures with resolution values that are small, called "high-resolution structures".

Electron density maps for structures with a range of resolutions are shown. The first three show tyrosine 103 from myoglobin, from entries 1a6m (1.0 Å resolution), 106m (2.0 Å resolution), and 108m (2.7 Å resolution). The final example shows tyrosine 130 from hemoglobin (chain B), from entry 1s0h (3.0 Å resolution). In the pictures, the blue and yellow contours surround regions of high electron density, and the atomic model is shown with sticks. The electron density was imaged using the Aster viewer.

About PDB-101

PDB-101 helps teachers, students, and the general public explore the 3D world of proteins and nucleic acids. Learning about their diverse shapes and functions helps to understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease to biological energy.

Why PDB-101? Researchers around the globe make these 3D structures freely available at the Protein Data Bank (PDB) archive. PDB-101 builds introductory materials to help beginners get started in the subject ("101", as in an entry level course) as well as resources for extended learning.

Additional Bioinformatic Analyses Involving Protein Sequences*

Supratim Choudhuri , in Bioinformatics for Beginners , 2014

8.2 Peptide Bond, Peptide Plane, Bond Rotation, Dihedral Angles, and Ramachandran Plot

Amino acids are linked together by peptide bonds. Peptide bonds are amide linkages between the NH2 and COOH groups of neighboring amino acids. The peptide bond (CN) has a partial double-bond character. Thus, it is rigid and planar and not free to rotate. The plane on which it lies is called the peptide plane or amide plane. Peptide bonds are trans bonds—that is, the carbonyl oxygen and amide hydrogen are in trans position. However, the NCα and CαC bonds are not rigid and they can freely rotate, being only limited by the size and character of the R-groups. The angle of rotation (also called torsion angle or dihedral angle) around the NCα bond is called phi (φ) and that around the CαC bond is called psi (ψ) ( Figure 8.1A ). These two angles largely determine the 3D shape of the polypeptide backbone of the protein.

Figure 8.1 . Peptide bond, peptide plane, and the Ramachandran plot.

(A) Peptide bond, peptide plane, phi and psi angles, and bond rotation involving two amino acids. The NCα and CαC bonds are not rigid and can freely rotate, being only limited by the size and character of the R-groups. (B) Diagram of a typical Ramachandran plot (φ/ψ plot). The regions marked “Core” correspond to conformations that do not have any steric hindrance. The yellow areas labeled “Allowed” correspond to conformations that could be possible if the atoms could come a little closer together. The white areas represent conformations that are sterically unfavorable (see text). (C) In computing a Ramachandran plot, atoms are treated as hard spheres whose dimensions correspond to their van der Waals radii. The van der Waals radius and covalent radius are depicted for comparison.

Although φ and ψ are less restricted in terms of rotation, the bulkiness of R-groups of the amino acids tends to impose some restrictions on the rotation through steric hindrance. This makes certain combinations of φ and ψ preferred. The φ/ψ plot of the amino acid residues in a peptide is called the Ramachandran plot . It involves plotting the φ values on the x-axis and the ψ values on the y-axis to predict the possible conformation of the peptide. The angle spectrum in each axis is from −180° to +180°. In computing a Ramachandran plot, atoms are treated as hard spheres whose dimensions correspond to their van der Waals radii. Any angle that results in the collision of the spheres is regarded as sterically unfavorable hence, such conformations are also sterically not allowed. Figure 8.1B shows a simplified diagram of a Ramachandran plot. The regions marked “Core” correspond to conformations that do not have any steric hindrance. The yellow areas labeled “Allowed” correspond to conformations that could be possible if slightly shorter van der Waals radii are used in the calculation. In other words, if the atoms could come a little closer together, then these conformations would be possible. The white areas represent conformations that are sterically unfavorable. The van der Waals radius and covalent radius are depicted in Figure 8.1C . The residues with a less bulky side chain or no side chain, such as glycine (no side chain), can have many possible combinations of φ and ψ (e.g. in a polyglycine backbone) resulting in a larger allowable area on the plot in all four quadrants, whereas residues with bulky side chains, such as proline or phenylalanine, have fewer possible combinations of φ and ψ, hence a smaller allowable area on the plot.

The φ and ψ angles for each residue in a helical structure are very similar, and that is what confers regularity to the helical structure. Positive angles correspond to clockwise rotation and negative angles correspond to anticlockwise rotation. The ideal values of φ/ψ were determined to be as follows: right-handed α-helix −57°/−47° left-handed α-helix +57°/+47° right-handed 310 helix −74°/−4° right-handed π-helix −57°/−70° parallel β-sheet (uncommon) −119°/+113° antiparallel β-sheet (common) −139°/+135°. The actual values differ somewhat from these idealized values. Recent experimental data have demonstrated that both φ and ψ can undergo large rotations, which are usually coupled. See Hovmöller, et al. 6 for more details on experimental determination of main-chain conformations in 1042 protein subunits.

Online tools are available from several sources for the analysis of Ramachandran plots of proteins. One such tool is available at the Uppsala Ramachandran Server ( ). This service is based on the Moleman2 program. 7

Phi/Psi using the Molecular Modelling Toolkit (MMTK)

The following short piece of python code uses Konrad Hinsen's python Molecular Modelling Toolkit (MMTK) to load a PDB file (1HMP), calculate ϕ and &psi, and print them to screen (in radians). The code is so succinct because MMTK has support to calculate the psi/phi protein backbone dihedral angles built in:

Or, you can do this residue by residue - printing out a little bit more information as we go:

The resulting output will look something like this (depending on which PDB file you use):

Note that for the first residue of each protein chain, phi is undefined, while for the last residue, psi is undefined. Also note that MMTK returns the angles in radians, while most plots are drawn from -180° to +180°.

This simplicity comes at a cost - MMTK is very particular about its PDB files, and will not tolerate much deviation from the standard. In fact, this example 1HMP is a case in point - there is actually a break in Chain B, as we will discover later using BioPython.

For another example, many PDB files contain "funny" hydrogens (e.g. for undeclared protonated histidines) which will make MMTK choke. One way round this is to explicitly ignore any hydrogens in the PDB file:

Also, in my experience, MMTK copes far better with files downloaded from the PDB, than it does with those which have been edited by another program. Howver, even files direct from the PDB contain errors and oddities (more). As an alternative to MMTK, we could use BioPython.

Determining if a Protein Model Contains a Backbone Clash - Biology

a Department of Biochemistry, The Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066CX, The Netherlands
* Correspondence e-mail: [email protected], [email protected]

Inherent protein flexibility, poor or low-resolution diffraction data or poorly defined electron-density maps often inhibit the building of complete structural models during X-ray structure determination. However, recent advances in crystallographic refinement and model building often allow completion of previously missing parts. This paper presents algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and `graft' these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in the PDB-REDO pipeline has enabled the building of 24� missing loops in the PDB. The models and the automated procedures are publicly available through the PDB-REDO databank and webserver. More complete protein structure models enable a higher quality public archive but also a better understanding of protein function, better comparison between homologous structures and more complete data mining in structural bioinformatics projects.

1. Introduction

Protein structure models give direct and detailed insights into biochemistry (Lamb et al. , 2015 ) and are therefore highly relevant to many areas of biology and biotechnology (Terwilliger & Bricogne, 2014 ). For decades, crystallography has been the leading technique in determining protein structure models (Berman et al. , 2014 ) and to date, over 120� crystallographic structure models are available from the Protein Data Bank (PDB Burley et al. , 2017 ). It is important to realize that all structures are interpretations of the underlying experimental data (Lamb et al. , 2015 Wlodawer et al. , 2013 ) and the quality of a structure model should therefore be scrutinized by validation (Read et al. , 2011 Richardson et al. , 2013 ).

Owing to numerous improvements in refinement and validation methods, the quality of protein structure models is continuously increasing (Read et al. , 2011 ) however, the completeness of models is decreasing (Fig. 1 ). About 70% of all crystallographic protein structures have regions that are missing (Djinovic-Carugo & Carugo, 2015 ) and this percentage is increasing. Typically, but not necessarily, these missing regions are loops between helices and strands. Loops occupy a large conformational space and can therefore be missing as a result of intrinsic disorder, meaning they cannot be modeled reliably in a single conformation. However, there are many cases where the experimental data provide useful information on the loop conformation and hence many loops can be built into protein structure models. The term `loop' will be used in this paper in its broader definition to denote a missing region of protein structure, regardless of secondary structure conformation.

Figure 1
Cumulative percentage (left) and absolute number (right) of residues missing in termini and in loops for all structures over the years.

Missing protein regions or `loops' are typically modeled towards the end of crystal structure determination. By that stage, all obvious features have typically been modeled, the electron-density maps have improved and it is possible to model the loops. Many programs are available for modeling loops either interactively (Emsley et al. , 2010 ) or automatically (Terwilliger et al. , 2008 Joosten et al. , 2008 Cowtan, 2012 Kleywegt & Jones, 1998 DePristo et al. , 2005 ). Completing the protein structure by modeling all loops that can be modeled, has two advantages: locally, the density becomes unavailable for modeling erroneous structural features such as other parts of the protein ( e.g. side chains), crystallization agents and water molecules and globally, a correctly fitted loop will reduce the phase error and give an overall improvement of the electron-density maps. Available loop-building approaches rely on one form or another of conformational sampling that attempts to find the best-fitting conformation of the loops based on the local electron density and our general knowledge of protein structure.

Building loops is often one of the most difficult, time-demanding and sometimes frustrating stages of crystallographic model building. If loops are too disordered to yield traceable electron density, they cannot be built and there is no problem. In many cases however, loops are sufficiently rigid to yield interpretable density. Typically, this electron density is not as clear as desired, which makes it challenging for crystallographers and model-building programs to model a loop in a realistic conformation. Whether this is eventually successful depends on many factors such as perseverance and skill of the crystallographer(s) and the algorithmic quality and ease-of-use of loop-building programs. Noteworthy are algorithms that attempt to interpret the electron density with multiple loop conformations (Burnley et al. , 2012 Levin et al. , 2007 ). Although there is little doubt that very often multiple conformations can better represent the experimental data, here we wanted to deal with the problem where a loop is not modeled at all despite strong experimental evidence that it can be modeled.

Previously, we have developed algorithms to transfer information about homologous protein structures for obtaining geometric restraints for low-resolution refinement (van Beusekom et al. , 2018 ). Here we exploit the relationship between homologous proteins for loop building. We reasoned that as highly similar (in terms of sequence) proteins have very similar structure, the conformation of a loop in one homologous structure can assist in identifying the loop location in another homologous structure that is being built. The presence of a loop in a homologous structure is also an indication that loop building is viable. If a region has not been modeled in any of the homologs, it is unlikely to be buildable in the new structure (provided there are many homologous models solved by different crystallographers), but if it is built at least once there is a good chance that the same region can be modeled in the structure in question. Of course there are exceptions to the similarity of homologous loops, as loops can adopt distinct conformations which are often of high significance for function: there are countless examples that describe loop motions associated with ligand interactions or a change in the context of different crystallographic contacts.

It has already been shown for a set of 16� PDB structures (Le Gall et al. , 2007 ) that although 92.2% of all crystallized residues were always ordered in all homologs and 4.4% were always disordered, 3.3% are `ambiguous' these residues were modeled in at least one structure but disordered in other(s). Another survey (Zhang et al. , 2007 ) observed that such regions, named `dual personality' fragments, occur in 45% of sequence-identical structure groups. Of the ambiguous loop regions, 59% are predicted to be ordered by all three protein disorder predictors used in a third study (DeForte & Uversky, 2016 ). Thus, the increasing redundancy of homologs in the PDB, the increasing percentage of unmodeled residues (Fig. 1 ) and the disorder predictions for these regions, argue that using `dual personality' regions between homologous structures is a viable strategy for increasing the completeness of protein structure models.

We have therefore decided to develop a procedure for building loops in cases where a loop is available in other homologous structure(s). The implementation of these new ideas has taken place in the context of PDB-REDO , a procedure we are developing to (re-)refine and partially rebuild protein structure models, both retrospectively [by updating existing PDB entries (Joosten et al. , 2012 )] and proactively [our software is available as a webserver (Joosten et al. , 2014 ) and for local installation]. As the PDB-REDO electron-density maps are often better than the original maps and can be simply recalculated from the PDB entries, incorporating this work in the PDB-REDO framework allows us to have the best possible maps for building and validation of the missing loops. For every structure in question, the missing loop is first identified, then built by grafting it from a homologous structure, refined to fit the electron-density map in real space, and finally validated against geometric criteria and the electron density. We have built and validated several thousands of loops missing from structures deposited in the PDB. Here we discuss the methods and show some examples where our procedure makes a notable difference in the structure model.

2. Methods

2.1. Loop building

We have developed algorithms to transfer loops from homologous protein structures to the target structure (Fig. 2 ). The manner of handling the homologs in Loopwhole is nearly identical to our previous program HODER (Beusekom et al. , 2018 ), which generates homology-based hydrogen-bond restraints. The only difference is that homologs are not filtered by resolution in Loopwhole . The default maximum length for attempted loop transfer is 30 amino acids.

Figure 2
Stepwise illustration of the Loopwhole algorithm, applied to three missing glycines (PDB entry 1dmn Kim et al. , 2000 ). All 2 mF oDF c and mF oDF c electron-density maps are shown at 0.7 σ and 3.0 σ , respectively. ( a ) The PDB structure has no apparent gap: two residues are wrongly bound to one another (highlighted in brown). The gap is detected from the sequence. ( b ) The two residues immediately adjacent to the missing loop are removed, as these are often in the wrong conformation. ( c ) All homologous chains with the loop present are structurally aligned to the target structure model (only four homologs shown). ( d ) If the surrounding residues align well, the loop is grafted into the target structure model. By default, the top ten alignments are kept (only one shown). ( e ) After real-space optimization, the loop with the best density fit is kept provided the density fit and geometrical quality pass the filter criteria.

Before any loop building is attempted, the density fit per chain is evaluated with EDSTATS (Tickle, 2012 ). In the rare case that the real-space correlation coefficient (RSCC) for an entire chain is below 0.80, we do not attempt to build loops in that chain, but instead warn the user that the density fit is low making the overall chain conformation unreliable.

There are two initial requirements for loop transfer: the presence of unmodeled loops in the input structure and modeled equivalent loops in the homologs. Unmodeled loops are detected using pdb2fasta in PDB-REDO and high-identity homologs are aligned by sequence. The details of these algorithms have been described earlier (van Beusekom et al. , 2018 ). If both requirements are met, loop transfer is attempted for each of the homologs that has a complete backbone model for that homolog. Both input model and homolog are required to have at least five consecutive modeled residues on each side of the loop. Of these ten residues, the two residues directly adjacent to the loop are remodeled together with the loop because they were often found to be in a suboptimal conformation. The remaining eight residues are used for alignment. To prepare residues for the alignment, side-chain atoms are deleted when mutations are present, and administrative flips for Asp, Glu, Phe and Tyr residues (DEFY flips) are performed to ensure equivalent atoms are in equivalent positions. A homolog is skipped if the sequence identity in the loop or the aligned residues is less than 50% the exception is a single-residue loop, which is allowed to be mutated. Finally, structural alignment is performed using quaternions (Kuipers, 2002 ).

If the backbone RMSD of alignment is less than 2.0 Å (in default settings), an initial loop transfer is performed. The two residues directly adjacent to the loop are deleted and the aligned loop, including the two aligned and directly adjacent residues, is inserted into the protein model. In the transferred loop, side chains are cropped where appropriate in the case of mutations, the occupancy of all atoms is set to 1.00 and the B factor is multiplied for each atom by the ratio of average B factors between input structure and homolog. Me­thio­nine is mutated to seleno­methio­nine and vice versa based on the other residues present in the input model.

The next check for the initial transfer of a loop is the evaluation of clashes with the modeled atoms already present, or with symmetry copies thereof. We make a distinction between main-chain and side-chain atom clashes. Since the position of a C β is significantly limited by the main-chain conformation, we include this side-chain atom in the main chain for clash analysis. Heavy clashes are defined as atom–atom distances ɚ.1 Å and small clashes as ɚ.6 Å.

In cases of clashes, important atom(s) must be retained. In Loopwhole we use hierarchical rules of atom importance to decide how to proceed. The atoms that are always kept are main-chain atoms and most ligands [the exceptions are glycerol, ethanol, and 1,2-ethane­diol and its polymeric (PEG) forms]. Whenever a main-chain atom from a loop candidate clashes with the previously modeled backbone or most ligands, the loop candidate is discarded. In contrast, if the main chain of a loop candidate clashes with a compound from the list of exceptions (such as glycerol), that compound is discarded. The second most important group is the side chains: they can be discarded temporarily to be added back later by the program SideAide (Joosten et al. , 2011 ). Previously modeled side chains are considered more important than loop side chains. Side chains, in turn, are considered more important than water molecules and any atoms with an occupancy of 0.01 or lower. These principles led to the following decisions.

Previously modeled side chains are removed only if they clash heavily with the loop backbone, unless they form a cysteine bridge: in such cases the loop candidate is discarded. Loop side chains [from γ -atom(s) onwards] are deleted if they clash heavily with any previously modeled protein (main chain or side chain) or with any other compound except for water. Ligands from the exception list are removed whenever they clash heavily with the main chain of the loop. Waters and atoms with an occupancy of 0.01 or less are removed even in cases of small clashes with any loop atom.

If there are no insuperable clashes, the loop candidate is saved and existing candidates are sorted according to RMSD. If two loop candidates have a very similar conformation (RMSD < 0.1 Å), the candidate with the worst RMSD of alignment is discarded. Once all BLAST hits are evaluated, the top candidates (by default, the top ten) are subjected to real-space refinement by coot-mini-rsr (Emsley et al. , 2010 ) using torsion-angle restraints. One extra residue from the existing protein on either side of the loop is added to the real-space refinement region, which allows the existing protein model to better adapt to the new loop. In the coot-mini-rsr input PDB file, clashing atoms are removed, atom numbering is updated (including CONECT records) and `gap' LINK records are deleted. Sometimes, there are still small gaps at the boundary of the transferred loop and the existing model. To increase the success rate of coot-mini-rsr closing these gaps, the backbone N atoms on the loop edges are moved into this gap. This can temporarily create unlikely atom bond lengths and angles, but these will be resolved in real-space refinement (or in the subsequent reciprocal-space refinement).

After running coot-mini-rsr , it is checked again to confirm whether there are no insuperable clashes between the loop and the protein, because we have observed that the loop may be placed into the density of other moieties in the real-space refinement, such as a symmetry copy of itself.

At this stage, all remaining loop candidates with bad geometry are discarded. First, candidates where there is no peptide bond between two consecutive residues or where coot-mini-rsr has not converged to a minimum are removed. The resulting RMS Z scores from coot-mini-rsr are used to filter bad geometry candidates: loops are rejected if bond or angle RMS Z values exceed 1.2, chirality RMS Z exceeds 1.5, or if plane or torsion RMS Z values exceed 2.0. In this filter, the RMS Z values are allowed to be relatively high because subsequent reciprocal-space refinement will further improve the loop. Loop candidates are only allowed to have cis -peptides if the corresponding residue in the original loop of the homolog is also a cis -peptide. Loops that have multiple sequential distorted omega angles (maximum deviation 30° from 0 or 180°) are discarded, but single distortions are allowed as these are usually resolved in subsequent refinement. Finally, loop candidates are evaluated on their Ramachandran Z score. If the Z score is poor (lower than 𕒹), it is compared with the Z scores of the other loop candidates and also the Z scores of the loop in the homologous-structure models from which it was adapted. Then, the candidate is discarded if it is a 2 σ negative outlier [according to Grubbs' test (Grubbs, 1950 )], either compared with the other loop candidates or with the original conformation of the loop. The Ramachandran Z -score calculation is performed using the algorithms of the new PDB-REDO program tortoize . This algorithm is based on the implementation in WHAT_CHECK (Hooft et al. , 1997 ) and is described in the Supporting Information.

The density for each remaining candidate is then computed using the cubic interpolation function from clipper (Cowtan, 2003 ). It is computed only for the main-chain atoms of the candidate loop to ensure that the metric is not influenced by the presence, absence or length of the side chains of the loop. Additionally, the density is computed for all main-chain atoms that are ordered in all homologs, i.e. the set of atoms that are always ordered. If there are fewer than 30 atoms in this set, all non-loop main-chain atoms in the input structure model are taken. The ratio between average loop-candidate density and the average density of the control set is then computed. This ratio must be over 0.25 for a loop candidate to be acceptable. The cut-off was established after manual inspection of several hundred candidate loops.

Finally, there is an option to subject a number of candidates (by default only the loop with the best density fit) to the PDB-REDO programs SideAide and pepflip (Joosten et al. , 2011 ) to complete the side chains and check for potential peptide flips of the loop area. However, this is not default behavior since these programs are already run after Loopwhole in the PDB-REDO implementation. However, Loopwhole writes a list of amino acids whose side chains are incomplete: at low resolution, SideAide is not run by default on all amino acids in PDB-REDO , but only on amino acids in a list, to which the novel residues in the loop are added.

After the optional running of SideAide and pepflip , the loop with the best main-chain density fit is kept.

There are a few special cases where detection or building of a missing loop is more complicated. First, there is the possibility that a loop is in fact modeled, but with all atoms modeled at zero occupancy or at an occupancy of 0.01. We consider such loops as unmodeled and proceed as described above however, we treat the current zero-occupancy loop itself as an extra candidate. Since this loop is already at the correct location in the model, no alignment is necessary aside from this, the candidate is treated the same as others.

Another special case is dealing with alternates in and near loops. If there are any main-chain alternates among the residues that are to be aligned with homologs, the missing loop is skipped because the alignment target is ambiguous. An exception is made if the only backbone alternate atoms are C α atoms (which is common for residues with alternate side-chain conformations) then simply the first atom is picked. In such cases, the positions of alternate C α atoms are very close to each other. Alternate side chains are truncated before alignment. In homologs, backbone alternate conformations are treated as separate candidates: structure alignment is performed for each alternate in the homolog and/or each alternate loop is transplanted. If there are multiple stretches that contain alternates with full occupancy atoms in between, combinations of these alternates must be aligned for completeness and the exponential increase of combinations makes computation expensive, hence these (rare) cases are excluded.

Finally, there are cases where residues next to a loop are only partially present. In such cases, the partially modeled residue is also removed before the loop fitting. That is, the loop is extended by one more residue and the partial residue is replaced.

2.2. Adding missing atoms, atom pairs or atom trios: fixDMC

We observed that some protein models are missing one or several atoms from a peptide backbone. In order to also correct these smaller missing parts, the program fixDMC (fix `dat' main chain) corrects these omissions, adds missing C-terminal O atoms, and resets occupancies to 1.0 in regions where there are no alternates and surrounding atoms are modeled at full occupancy.

We make use of the fact that C α i , C i , O i , N i and C α i +1 lie in a plane. Whenever at least three atoms of a single plane are present, this planarity combined with the known geometry of an amino acid gives enough information to compute its coordinates. The C α atom lies in two planes: that of the preceding and the following amino-acid residues. Therefore, it can be added based on either of these residues. By applying the geometrical rules of planarity extensively, we can compute any set of one, two or three atoms provided that the preceding and following residues are modeled.

Additionally, fixDMC uses functionality from pdb2fasta in PDB-REDO (van Beusekom et al. , 2018 ) to add the second C-terminal O (`OXT') if the SEQRES records or user-inputted FASTA file indicate that the complete C-terminal residue has been modeled except for this atom. The addition of this atom can also be based on the peptide plane.

Finally, the occupancy of protein atoms is reset to full occupancy if the residue contains no alternates, and the preceding and subsequent atoms are both modeled at full occupancy. An exception is made for the carbonyl O atom: since this atom is only bound to a single C atom, only that C atom is required to be modeled at full occupancy.

2.3. Implementation in PDB-REDO

The program fixDMC is run at the early stages of PDB-REDO after the initial electron-density maps are calculated, before any individual atomic coordinate or B factor refinement. The OXT atoms are only added if you can rely on the fact that the final modeled residue is the actual C-terminus of the crystallized construct. Therefore, this step is only performed if the header of the input PDB file has SEQRES records or if user-supplied sequence(s) can be mapped to the modeled atomic coordinates.

Loopwhole is run after the initial refinement in PDB-REDO . The default behavior is to always attempt to build loops, but this can be switched off if needed. It should be noted that on the PDB-REDO webserver, loops can only be built if the sequence of the missing residues is known. That is, users must supply the sequence as a FASTA file or as SEQRES records in the PDB file. If Loopwhole builds any residues, REFMAC5 (Murshudov et al. , 2011 ) is run to obtain more accurate B factor estimates and new electron-density map coefficients. This refinement uses automated geometric restraint weighting and five refinement cycles if a loop has an RSCC below 0.60, it is discarded and any water molecules or other compounds initially deleted to fit the loop are restored. Then the other rebuilding stages of PDB-REDO (Joosten et al. , 2011 ) are run.

Sequence files in PDB-REDO mark residues with a complete backbone in uppercase letters and incomplete or unmodeled residues with lowercase letters (van Beusekom et al. , 2018 ), an idea adopted from the SEQATOMS server (Brandt et al. , 2008 ). Therefore, both Loopwhole and fixDMC write updated FASTA files to reflect changes in residue completeness or presence. Additionally, Loopwhole updates the TLS groups in PDB-REDO . If a TLS group is surrounding the loop, the loop is added to that group if the loop is on the border of two TLS groups, it is added to the first one.

At the final stage of PDB-REDO , the program Modelcompare writes a datafile that is used by Coot (Emsley et al. , 2010 ) and 3Dbionotes (Segura et al. , 2017 Tabas-Madrid et al. , 2016 ) to highlight the new loops.

2.4. Testing

Loopwhole was run over all entries available in PDB-REDO to identify which loops could be built. Hundreds of randomly selected loops were manually analyzed to empirically establish the validation cut-offs mentioned above. Finally, from all entries in which Loopwhole built loops, 2000 entries were randomly selected for further analysis in PDB-REDO. These entries were subjected to the PDB-REDO pipeline twice: once with and once without loop building. Owing to various limitations (not related to loop building), ten PDB-REDO jobs were not completed, hence the final test set consisted of 1990 entries.

3. Results

3.1. Loop building

The computer program Loopwhole was developed to build protein loops based on homology (Fig. 2 ). We first applied Loopwhole to the structures available in the PDB (Table 1 ). When Loopwhole was then applied to the PDB-REDO databank, we observed an increase of 11% in the number of built loops. This is likely to be because the structure models and the electron-density maps in PDB-REDO (which are obtained after modern re-refinement and rebuilding) are of higher quality than their `static' PDB counterparts. The total number of missing loops in the PDB-REDO databank was 148�. An initial loop was constructed by Loopwhole in 66� cases (44%). For the other 56%, there were either no homologous loops available, or the loop conformation was too different between the `donor' and `acceptor' structures as a result of genuine structural differences, or because of `sequence register' errors. Another 41� loops (28% of total) were discarded according to various validation criteria (Fig. 3 ), keeping 24� successfully built loops in the final model. Many loops were rejected as their fit to the electron density was too poor (Fig. 3 b ), and less often based on geometrical criteria or because both density fit and geometry were poor. The remaining loops have excellent geometry, typically better than the loop in the original structure (Fig. 3 c ) and a good fit to the density.

Table 1
Number of built loops and affected structures in the PDB and PDB-REDO databanks. We used 112� PDB-REDO entries available in February 2018

Figure 3
Cumulative distributions for properties of loops that can be built in the PDB. (Left) Most of the buildable loops are short. (Middle) Density ratio of loop candidates in contrast to the other two subfigures, this includes loops that were not built. This metric represents the observed density for the loop main chain divided by the average main-chain density. The minimum required density ratio of 0.25 is indicated by the vertical red line. Of all initial loop candidates, 60% have insufficient density and are therefore discarded. (Right) The Ramachandran Z score for candidate loops and their counterparts in the structure model from which they were taken. The backbone conformation of the built loops is excellent and better than the conformation of loops in structures from which they were taken, which is largely a result of the application of Ramachandran restraints in the real-space refinement of loops in coot-mini-rsr (Emsley et al. , 2010 ).

The current version of Loopwhole was able to build a total of 24� missing loops in 11� entries. For 359 cases in which a loop was built, a zero-occupancy loop was present in the original model. To place the loops, 18� water molecules were removed additionally, small molecules such as glycerol or ethane­diol were removed in 22 cases. The distribution of the length of the built loops is shown in Fig. 3 ( a ).

Next, we examined whether incorporating Loopwhole in the PDB-REDO pipeline had an impact on the performance of PDB-REDO as a whole. We thus ran PDB-REDO on 1990 randomly selected structures in which loops could be built, once with and once without loop building. The impact of loop building on standard validation metrics (Read et al. , 2011 ) such as R free , the Ramachandran Z score and packing Z score was minimal (Fig. S1). The mean RSCC values (indicating the fit to the density) correlate well with the mean B factor for the loops in most cases (Fig. S2). The mean RSCC and RSR values for the loops themselves were 0.75 and 0.14, respectively (Fig. 4 ). These values are naturally lower than for well defined parts of the structure model, but are for example, consistent with density criteria for acceptable ligands (Weichenberger et al. , 2013 Cereto-Massagué et al. , 2013 Warren et al. , 2012 ). However, some built loops had lower than anticipated RSCC values. Following manual examination of example cases, we decided to discard loops with an RSCC below 0.60 (Fig. 4 ). Of the 3419 loops built in the test set, 305 loops were discarded in this step, yielding a total of 3114 loops built in the test entries. We then manually inspected examples of loops that passed the RSCC cutoff, but had density ratios between the cut-off value of 0.25 and 0.3. We concluded that these `lowest-quality' loops still fit the density well and should be kept in the model six randomly selected examples are shown in Fig. 5 . Thus, loop building in PDB-REDO has now been enabled by default. For practical reasons (lack of CPU time) the remaining PDB entries are being gradually `redone' and placed in the PDB-REDO databank.

Figure 4
Distribution of the RSCC for all loops built in 1990 PDB-REDO entries. The RSCC was calculated on the final PDB-REDO structure models. All loops with an RSCC below 0.6 are colored red and are discarded.

Figure 5
Six examples of loops with a density ratio between the cutoff value (0.25) and 0.30. The 2 mF oDF c maps are shown at 0.8 σ mF oDF c maps are shown at 3.0 σ . ( a ) 4fc9 C474�, density ratio 0.27 ( b ) 5h8p A151-152, density ratio 0.29 ( c ) 2d31 A266-267, density ratio 0.25 ( d ) 2y1t F18, density ratio 0.27 ( e ) 2y7q B480�, density ratio 0.26. For part of the loop, no density is observed at 0.8 σ however, it does show up at 0.6 σ ( f ) 4w9x A121�, density ratio 0.26. For clarity, the side chain of TrpA124 is not shown.

3.2. Completing the main chain

Rather surprisingly, we found that many structures in the PDB are missing individual atoms in the main chain. Therefore, we created the program fixDMC that can add one to three missing main-chain atoms based on the geometry of the existing atoms (see ڈ for details). Running it in the same PDB-REDO dataset as above, single atoms were added in 1500 cases (of which, 1281 were carbonyl O atoms), atom pairs were added in 55 cases and in 40 residues three atoms were added. Additionally, there were 38� cases with individual backbone atoms that had their occupancies set to values less than 1.00 without being part of an alternate conformation or of a partially occupied peptide ligand, out of which, 2926 cases had an occupancy of zero. Finally, we found that the second C-terminal O atom (OXT in PDB nomenclature) is missing in many PDB entries fixDMC added OXTs to 41� protein chains, where the terminal residue in the structure coincides with the terminal residue in the declared construct sequence. Notably, the percentage of protein chains with missing C-terminal O atoms has been steadily increasing over the years (Fig. 6 ). In 2017, OXT is missing from 44% of chains with a modeled C-terminal amino acid.

Figure 6
The percentage of C-terminal O atoms (OXT): present in the PDB missing the C-terminal amino acid and thus not buildable and newly added to the C-terminal amino acid.

3.3. Examples of built loops

To illustrate the relevance of building loops in the PDB, here we show several examples in which Loopwhole clearly improves structure interpretation.

First, a structure of β -glucosidase (PDB entry 3abz Yoshida et al. , 2010 ) has seven missing regions, five of which can be added. Only the first of four NCS copies is complete. One of the missing regions, a stretch of 14 residues, is shown in Fig. 7 ( a ). The electron density for this loop is very clear. By adding the loops to the structure, the structure model is now much more complete and thus more easily interpreted.

Figure 7
Examples of built loops. Newly built parts are shown in pink. ( a ) 3abz chain B residues 497�. The RSCC for this loop is 0.76 in OMIT map it is 0.55. ( b ) 5iro chain I residues 243�. RSCC values (normal/OMIT) are 0.84 and 0.98. ( c ) 2amj chain D residues 108�. RSCC values (normal/OMIT) are 0.80 and 0.61. Details described in the main text. The 2 mF oDF c maps are shown at 0.8, 1.2 and 1.0 σ , respectively. The mF oDF c maps are shown at 3.0 σ in all cases.

A second example shows how the general improvement of fit to the crystallographic data in PDB-REDO can facilitate when loop building is included. As shown in Fig. 7 ( b ), a strand is missing from the β -sheet of the α 3 subunit of HLA in PDB entry 5iro (Li et al. , 2016 ), a protein complex of Adenovirus type 4 E3-19K with HLA class I histocompatibility antigen. By optimization of the structure model by PDB-REDO including loop building, the R / R free drops from 0.220/0.236 to 0.182/0.190. The overall improvement of the model leads to much clearer maps into which the missing strand can be built unequivocally.

Finally, two loops are missing from PDB entry 2amj, the apo-structure model of modulator of drug activity B (MdaB), a putative DT-diaphorase (Adams & Jia, 2006 ). In this paper, the authors also discuss the FAD-bound state structure (2b3d Adams & Jia, 2006 ), where both loops are ordered: it is stated, that these loops become ordered upon binding of FAD as a result of the rearrangement in the hydrogen-bonding network. However, there is clear density for one of the loops (L2). No less than 13 water molecules have been built into the density of the missing loop in chain D (Fig. 7 c ). It was proposed (Adams & Jia, 2006 ) that FAD binding induces loop stabilization through changes in the hydrogen-bonding network modeling this loop shows that the structure model of the apo-form and FAD-bound form are highly similar. Therefore, the structural evidence does not necessarily support this claim.

4. Discussion

The increasing number of residues that are not built into new protein structures can be attributed to many factors: the ever larger structures determined by X-ray crystallography (van Beusekom et al. , 2016 ) are more likely to contain flexible regions within stable scaffolds better annotation of the sequence of crystallized constructs (Henrick et al. , 2008 Dutta et al. , 2009 ) highlights missing regions better and opportunities to built loops supported by the electron density are ignored because of haste or lack of experience (Mowbray et al. , 1999 ) as new generations of crystallographers are determining structures with a higher throughput. A worrying observation we made whilst teaching is that some students had deleted parts of a model to improve validation statistics (Read et al. , 2011 ) such as the percentage of RSR Z outliers.

It is generally accepted that loops with a less defined electron density occur in multiple conformations. In many cases, a single model with a sufficiently large margin of error represented in the B factors is suitable to represent the experimental data. There may be cases where more than one discrete loop could be modeled. At present in such cases, we build a single conformation for these loops. Though an extension towards building multiple conformations of such loops is possible, existing solutions could be used to explicitly model this conformational variability ( e.g. Burnley et al. , 2012 Levin et al. , 2007 ).

We argue that models should be built as complete as possible given the data, because high completeness will increase their usefulness to the user community. For instance, simulations of protein-complex formation might improve when the protein structure model is as complete (and correct) as possible: it is better to have a starting experimental conformation available even if supported only by weak electron density, rather than predicting by purely computational methods. Also, the presence of loops in the refinement improves the local-structure quality of the loop surroundings because the added atoms impose better conformational restraints in the structural neighborhood.

In some instances, our methods were unable to build a missing loop in an area with relatively clear electron density. This had to do with either the lack of well aligned homologous loops or the poor geometry of the fitted fragments. In some cases, this could point to register errors (a few residues aligned erroneously onto the sequence). Better comparison between homologous structures, for instance, using tools like phenix.structure_comparison (Moriarty et al. , 2018 ), could be used to identify such regions more reliably this is not feasible in an automated fashion with the current methods.

The algorithms we have developed to decide whether loops should be kept or not may also be applied to existing parts of protein structure models. We have emphasized in this paper the number of loops that are not built in the PDB but may be buildable however, there may also be cases where crystallographers have modeled loops over-enthusiastically. To estimate the extent of this, we analyzed the distribution of the density ratio between atoms in loops and random atoms in the template structures (the structures where the loops were copied from). This density ratio is better for the template loops than the density threshold cutoff for the newly built loops in most cases (Fig. 8 ) this is to be expected because the loops are normally missing precisely because their electron density was not very clear. However, there are also cases where the density fit of the template loop is quite poor inspection of several cases shows that the majority of these loops are likely to have a correct conformation that is supported by the electron density. However, there are also cases where the loop should not have been modeled.

Figure 8
Frequency of specific density ratios of loop backbone versus the rest of the backbone for the loops that were used as template from homologous structures. Most template loops have sufficient density and therefore they would also have been built by Loopwhole : those loops are to the right side of the red line indicating the density ratio cutoff of 0.25.

A possible addition to the methods presented here is building partial loops or expanding the termini. At first, we allowed partial loops to be built by Loopwhole if the density fit for that part of the loop was above the density threshold. Although the majority of the 8217 partial loops we built were modeled correctly, building partial loops was not sufficiently reliable to automate. Too often, an amino acid would be modeled into the neighboring density of water molecules or unidentified ligands such as PEG. The same issue would arise for terminal extensions with two additional difficulties. First, the number of residues that may be added is not always clear. Many residues can be missing from the terminus and, since it is unclear a priori how far the terminus can be extended, the residues should be added one by one. This is less efficient than loop fitting and therefore likely to cost much more CPU time per added residue. Second, instead of two anchor points on the ends of a missing loop, a terminus only has a single anchor point: the current terminus. The absence of a second anchor point means a drastic loss of information about the general direction in which the residues should be built. Therefore, it is more difficult to detect cases where the expanded fragment is not built into the correct electron density. The combination of these two limitations has kept us from implementing termini extension at present.

New structure models are added to the PDB every week, enriching the set of homologous structure data. The availability of suitable candidates for loop transfer will therefore only increase further, facilitating loop modeling for new structures. The pro-active updating of existing structure models by PDB-REDO ensures that older structure models also benefit from the increased availability of homologs. The original struggle to find a good loop conformation for the first published structure model in a protein family will remain, but it has become a temporary problem solving the first structure of a protein provides a handle for much future structural research and now in return this future research also provides means to make this first structure more complete. We have clearly demonstrated that the increased availability of homologous data can be used to improve the completeness of protein structure models of the past, the present and the future.

5. Availability

Both the PDB-REDO databank and webserver are hosted on On the webserver, crystallographers can submit their work-in-progress models to run PDB-REDO including the new loop-building procedure. The 1990 models from the test set are available through the databank. Existing databank entries are gradually updated to include the loop-building procedure. On the PDB-REDO databank entry pages, registered users can submit an update request to prioritize the update of that entry. Binary executables of Loopwhole , fixDMC and tortoize are available from the website and source code is available on request.

6. Related literature

The following reference is cited in the supporting information for this article: Wang & Dunbrack (2005 ).

Determining if a Protein Model Contains a Backbone Clash - Biology

The figure below shows the three main chain torsion angles of a polypeptide. Phi (&Phi C, N, C&alpha, C) and psi (&Psi N, C&alpha, C, N) are on either side of the C&alpha atom and omega (&omega C&alpha, C, N, C&alpha) describes the angle of the peptide bond. While &Phi and &Psi have considerable rotational freedom, &omega is planar. This is a result of the partial double bond character of the peptide bond which is caused by resonance effects, i.e. delocalized electrons (N-C=O N+=C-O-). A trans configuration (&approx180°) is preferred for steric reasons. Cis configuration (&approx0°) is rare, except for prolines. Peptide torsion angles. A chain of two amino acids with the three torsion angles phi (&Phi), psi (&Psi) and omega (&omega). Resonance of peptide bond affecting &omega is indicated in light blue.

A "C-N" bond is called amine bond, while "O=C-N" is an amide (with one hydrogen or organic group on the carbon and two on the nitrogen). The peptide bond is neither a pure C-N bond, nor is it a C=N bond. Rather two main canonical structures exist (N-C=O and N+=C-O-) simultaneously.

1.2 The Ramachandran Plot

While the &omega angles are restricted, the polypeptide main chain exhibits considerable freedom to rotate around the N-C&alpha (&Phi) and C&alpha-C (&Psi) bonds. This is visualized in the Ramachandran plot. GN Ramachandran (Ramachandran, Ramakrishnan, and Sasisekharan 1963) used computer models of small polypeptides to systematically sample the &Phi/&Psi space with the objective of finding stable conformations. For each conformation, the structure was examined for close contacts between atoms. Atoms were treated as hard spheres with dimensions corresponding to their van der Waals radii (two different sets of VdW parameters were used, including some more flexibility in the backbone in one case). Therefore three parts of the plot were calculated, the fully allowed part (favoured), outer limit (allowed) and disallowed part, where atoms would clash in both cases. Below is the Ramachandran plot based on the orignal [from Wikimedia] Ramachandran plot from wikimedia based on the original plot by Ramachandran et al. Favoured, or fully allowed region, is marked with solid black lines, allowed, or outer limit region, is represented with a dotted black line.

Ramachandran et al. could assign key secondary structures to specific regions in the plot. In the favoured (or fully allowed part, as they named it) region the beta sheets, the polyproline helix and the (right handed) alpha helix occur. The outer limit, which was calculated with smaller VdW radii brought out an additional region which corresponds to the left-handed alpha-helix.

L-amino acids cannot form extended regions of left-handed helix but occasionally individual residues adopt this conformation. These residues are usually glycine but can also be asparagine or aspartate, where the side chain forms a hydrogen bond with the main chain and therefore stabilises this otherwise unfavourable conformation. The 310 helix occurs close to the upper right of the alpha-helical region and is on the edge of the allowed region indicating lower stability. Disallowed regions generally involve steric hindrance between the side chain atoms and main chain atoms. Glycine has no side chain and therefore can adopt &Phi and &Psi angles in all four quadrants of the Ramachandran plot. Hence it frequently occurs in turn regions of proteins where any other residue would be sterically hindered.

With ever increasing numbers of experimentally determined protein structures, newer iterations of the Ramachandran plot are based on distributions extracted from experimental data. The general case largely corresponds to the original work displayed above. However, glycine and proline exhibit very characteristic properties owed to their sidechains. Glycine has only a single hydrogen as sidechain which leads to less steric hindrance and thus increased rotational freedom around the main chain torsion angles. The sidechain of proline connects with its nitrogen forming a loop. The result is an exceptional conformational rigidity.

General (No Proline or Glycine)
Glycine Only
Proline Only
Pre-Proline Only

The Ramachandran plots displayed above represent all &Phi/&Psi torsion angles extracted from 12,521 non redundant experimental structures (pairwise sequence identity cutoff 30%, X-ray resolution cutoff 2.5Å) as culled from PISCES.

1.3 The alpha-helix.

1.3.1 Development of an alpha-helix structure model.

Pauling and Corey twisted models of polypeptides to find ways of getting the backbone into regular conformations which would agree with alpha-keratin fibre diffraction data. The most simple and elegant arrangement is a right-handed spiral conformation known as the 'alpha-helix'.

Alpha-helix. Left: Side view of an alpha helix. The N-terminal part of the peptide is to the left. Top left: The molecules are depicted in licorice. Hydrogen bonds are shown with green dotted lines. Bottom left: same view in VdW style. Right: View into an alpha-helix. The N-terminal part is in front. For clarity only the backbone and C&beta carbon of the protein is shown.

Example of a protein with an alpha helix content of >80% Left: Cartoon view of human adenosine A1 receptor A1AR-bRIL, pdb entry: 5UEN. Right: Ramachandran plot for all non-proline/glycine residues.

1.3.2 Properties of the alpha-helix.

The structure repeats itself every 5.4 Å along the helix axis, i.e. we say that the alpha-helix has a pitch of 5.4 Å. alpha-helices have 3.6 amino acid residues per turn, i.e. a helix which is 36 amino acids long would form 10 turns. The separation of residues along the helix axis is 5.4/3.6 or 1.5 Å, i.e. the alpha-helix has a rise per residue of 1.5 Å.

  • Every main chain C=O and N-H group is hydrogen-bonded to a peptide bond 4 residues away (i.e. Oi to Ni+4). This gives a very regular, stable arrangement.
  • The peptide planes are roughly parallel with the helix axis and the dipoles within the helix are aligned, i.e. all C=O groups point in the same direction and all N-H groups point the other way. Side chains point outward from helix axis and are generally oriented towards its amino-terminal end.
  • &Phi and &Psi are both negative.

1.3.3 Distortions of alpha-helices.

  • The packing of buried helices against other secondary structure elements in the core of the protein.
  • Proline residues induce distortions of around 20° in the direction of the helix axis. This is because Proline cannot form a regular α-helix due to steric hindrance arising from its cyclic side chain which also blocks the main chain N atom and chemically prevents it forming a hydrogen bond. Proline causes two H-bonds in the helix to be broken since the NH group of the following residue is also prevented from forming a good hydrogen bond (read more). Helices containing Proline are usually long perhaps because shorter helices would be destabilized by the presence of a Proline residue too much. Proline occurs more commonly in extended regions of polypeptide.
  • Solvent exposed helices are often bent away from the solvent region. This is because the exposed C=O groups tend to point towards solvent to maximize their H-bonding capacity, i.e. tend to form H-bonds to solvent as well as N-H groups. This gives rise to a bend in the helix axis.

1.4 310-Helices.

310-Helices form a distinct class of helix but they are generally short and frequently occur at the termini of regular alpha-helices. The name 310 arises because there are three residues per turn and ten atoms enclosed in a ring formed by each hydrogen bond (note the hydrogen atom is included in this count). There are main chain hydrogen bonds between residues separated by three residues along the chain (i.e. Oi to Ni+3). In this nomenclature the Pauling-Corey alpha-helix is a 3.613-helix. The dipoles of the 310-helix are not so well aligned as in the alpha-helix, therefore it is a less stable structure and side chain packing is less favourable.

A small stretch of a 310-helix. The helix is shown in licorice style. Only the backbone and C&beta atom is shown for clarity. Left: side view, The 10 atoms of the three amino acids are numbered and the hydrogen bond is indicated as a green dotted line. Right: top view (N-terminal above the clipping plane).

1.5 The beta-sheet.

1.5.1 The beta-sheet structure.

Pauling and Corey derived a model for the conformation of fibrous proteins known as beta-keratins. In this conformation the polypeptide does not form a coil. Instead, it zig-zags in a more extended conformation than the alpha-helix. Amino acid residues in the beta-conformation have negative &Phi angles and the &Psi angles are positive. Typical values are &Phi = -140 degrees and &Psi = 130 degrees. In contrast, alpha-helical residues have both negative &Phi and &Psi angles. A section of polypeptide with residues in the beta-conformation is referred to as a beta-strand and these strands can associate by main chain hydrogen bonding interactions to form a sheet.

In a beta-sheet two or more polypeptide chains run alongside each other and are linked in a regular manner by hydrogen bonds between the main chain C=O and N-H groups. Therefore all hydrogen bonds in a beta-sheet are between different segments of polypeptide. This contrasts with the alpha-helix where all hydrogen bonds involve the same element of secondary structure. The R-groups (side chains) of neighbouring residues in a beta-strand point in opposite directions.

Beta-sheets are often depicted as arrows. Conventionally the arrow points towards the C-terminal part of the peptide.
View of a single beta-strand. The dark green box marks the plain of the beta sheet. On top the plain is in line with the monitor. The bottom strand is rotated 90º and the plain is sticking out of the monitor.

Imagine two strands parallel to the ones shown above, either in the plane with the strand (top strand), or one in front of the screen-plane and one behind (bottom strand). This is how the pleated appearance of the beta-sheet arises. Note that peptide groups of adjacent residues point in opposite directions whereas with alpha-helices the peptide bonds all point one way.

The axial distance between adjacent residues is 3.5 Å. There are two residues per repeat unit which gives the beta-strand a 7 Å pitch. This compares with the alpha-helix where the axial distance between adjacent residues is only 1.5 Å. Clearly, polypeptides in the beta-conformation are far more extended than those in the alpha-helical conformation.

1.5.2 Parallel, antiparallel and mixed beta-sheets.

In parallel beta-sheets the strands all run in one direction, whereas in antiparallel adjacent sheets run in opposite direction. In mixed sheets some strands are parallel and others are antiparallel.

Different types of beta sheets. Hydrogen bonds are represented with dotted lines. Both a schematic representation and snippets from protein structures are shown. For clarity only the backbone heavy atoms and C&beta atoms, where applicable, are shown. (Anti parallel sheet is a snipped from PDB code 5KO0, parallel from 1DAB and mixed from 4TVW)

In the classical Pauling-Corey models the parallel beta-sheet has somewhat more distorted and consequently weaker hydrogen bonds between the strands.

Beta-sheets are very common in globular proteins and most contain less than six strands. The width of a six-stranded beta-sheet is approximately 25 Å. No preference for parallel or antiparallel beta-sheets is observed, but parallel sheets with less than four strands are rare, perhaps reflecting their lower stability. Sheets tend to be either all parallel or all antiparallel, but mixed sheets do occur.

The Pauling-Corey model of the beta-sheet is planar. However, most beta-sheets found in globular protein X-ray structures are twisted. This twist is left-handed as shown below. The overall twisting of the sheet results from a relative rotation of each residue in the strands by 30 degrees per amino acid in a right-handed sense.

View into a beta-sheet. Note the rotation between the strands. Schematic view (top) and part of a beta sheet (pdb entry: 4TVW). For clarity only the backbone heavy atom and C&beta is shown.

Parallel sheets are less twisted than antiparallel and are always buried. In contrast, antiparallel sheets can withstand greater distortions (twisting and beta-bulges) and greater exposure to solvent. This implies that antiparallel sheets are more stable than parallel ones which is consistent both with the hydrogen bond geometry and the fact that small parallel sheets rarely occur (see above).

1.6 Coils and turns

1.6.1 beta-turns (reverse turns)

A beta turn is a region of four consecutive residues with (often) a hydrogen bond between the carbonyl oxygen of the ith main chain residue and the NH group of the i+3rd residue along the chain (Oi to NHi+3). The subtype is defined by the &Phi and &Psi angles of the middle two residues (i+1 and i+2). Often the hydrogen bond is deemed obligatory and only motives with a C&alpha distance between ith and i+4th residue below 7Å are considered. Each turn is assigned to one of nine classes. Helical regions are excluded from this definition and turns between beta-strands form a special class of turn known as the beta-hairpin (see later). In the following four frequent beta-turns are described.

Types I and II shown in the figure below are the most common reverse turns, the essential difference between them being the orientation of the peptide bond between residues at (i+1) and (i+2). Types I' and II' are their respected left-handed form.

Beta (reverse) turn structures. Type I and II turns (top) with their left handed counterparts (bottom) are shown. The type of a beta turn is determined by the &Phi and &Psi angles of the residue i+1 and i+2. Hydrogen bonds between the Oxygen of the ith residue and the NH group of the i+3th residue are indicated with a dotted green line. Note that the turn on the top left is not forming a hydrogen bond (indicated by //) in this example. To the right the ideal dihedral angles of the i+1th (purple) and i+2rd (red) residue are shown on the ramachandran plot. The cross marks ±30º range.

Note that the (i+2) residue of the type I' and II turn lies in a region of the Ramachandran plot which is rarely occupied by non-glycine amino acids. From the diagram of I' turn it can be seen that were the (i+2) residue to have a side chain, there would be steric hindrance with the carbonyl oxygen of the preceding residue.

For further details see also the descriptions of beta truns in in PDBeMotif or PDBsum.

Hershey and Chase Experiment Steps

Hershey and Chase gave full evidence of the DNA being a genetic material by their experiments. To perform the experiment, Hershey and Chase have taken T-2 bacteriophage (invaders of E.coli bacteria). The experiment includes the following steps:

Radioactive Labelling of Bacteriophage

Hershey and Chase have grown T-2 bacteriophages in the two batches. In the batch-1, we need to grow the bacteriophages in the medium containing radioactive sulphur (S 35 ) and radioactive phosphorus (P 32 ) in batch-2. After incubation, we could see that the radioactive sulphur (S 35 ) will tag the phage protein. The radioactive phosphorus (P 32 ) will tag the phage DNA.

Now there will be a question of why radioactive sulphur has labelled only the phage protein, not DNA. The labelling of phage protein by S 35 is because sulphur being a structural element of protein will tag the Phage protein, not the phage DNA.

Conversely, the labelling of phage DNA by P 32 is because phosphorus constitutes the backbone of DNA so it will tag the Phage DNA, not the phage protein.


After radioactive labelling of the phage DNA and protein, Hershey and Chase infected the bacteria, i.e. E.coli by using the radioactively labelled T-2 phage. In batch-1, T-2 phage tagged with S 35 and in batch-2 T-2 phage labelled with P 32 were allowed to infect the bacterial cells of E.coli.

After the attachment of T-2 bacteriophage to the E.coli, the phage DNA will enter the cytoplasm of E.coli. The phage DNA will take up the host cell machinery. Degradation of the bacterial genome occurs by the T2-phages where they use the ribosomes to form structural proteins of the capsid, tail fibres, base plate etc.


At the time of blending or agitation, the bacterial cells are agitated to remove the viral coats. As a result of the agitation, we get a solution containing bacterial cells and viral particles like capsid, tail fibres, base plate, DNA etc.


This stage separates the viral particles and bacterial cells from the above solution. As a result of centrifugation, the heavier particles, i.e. bacterial cells, will deposit in the form of “Pellets”. The lighter particle, i.e. viral particles, will appear in the “Supernatant” of the solution.


After the centrifugation, we could observe the results to identify the heritable factor. The phage DNA labelled with P 32 will transfer the radioactivity in the host cell. Thus, the radioactive P 32 enters a bacterial cell and exists in the form of “Pellets”. The phage protein tagged with S 35 will not transfer its radioactivity in the host cell. As a result, radioactive S 35 will appear in the form of “Supernatant” in the solution.


  1. Jordell

    Thanks for your support.

  2. Gozshura

    Totally agree with her. Great idea, I agree.

  3. Kagashura

    Funny as hell. Or, I'm afraid, it’s not funny, but creepy.

  4. Galinthias

    Wonderful phrase

  5. Dozilkree

    Thanks! Super article! Blog in reader unambiguously

  6. Nalkree

    that we would do without your remarkable phrase

Write a message