Previous Chapter: A PRIMER ON PROTEIN STRUCTURE
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 241

individual amino acids in the protein, one can determine which amino acids lie near each other. Based on these constraints, one can use the mathematical technique of distance geometry (Crippen and Havel, 1988) or restrained molecular dynamics with simulated annealing to build a partially constrained structure. (The isotopes 13C and 15N can also provide additional information.) Currently, this approach requires a noncrystalline but highly concentrated protein solution and works only for relatively small proteins (the resonances broaden as the molecule size increases and its tumbling time decreases).

Basic Insights about Protein Structure

If a protein sequence contains sufficient information to code for a folded structure, it should be possible to construct a potential energy function that reflects the energetics of an assembling polypeptide chain. In principle, one would ''only" need to find the minimum of this potential function to know the protein's folded state. In practice, this goal has proved elusive.

Some early workers defined molecular force fields compatible with the experimentally measured conformational preferences of small molecules (Lifson and Warshel, 1969). Unfortunately, attempts to fold a denatured chain using this approach were unsuccessful (Levitt, 1976; Hagler and Honig, 1978) because multiple local minima along the potential energy surface trapped the folding chain in unproductive conformations (see Figure 9.3). Even with improved search strategies including molecular dynamics and Monte Carlo methods, it has not been possible to find the native structure from a random starting point (Howard and Kollman, 1988; Wilson and Doniach, 1989). This has been called the "multiple minima problem." It remains a critical problem for the conformational analysis of complex molecules. Despite the inability to fold proteins de novo, this approach has proved valuable for studying the behavior of proteins by studying small perturbations around the known structure.

Because direct computation is difficult, one approach would be to look for patternsand regularities in protein structures that might simplify the task of prediction. In fact, considerable insight can be gained by simply looking at experimentally determined protein structures. First of all, one

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 242

image

Figure 9.3
The multiple minima problem: a two-dimensional schematic 
of the energy surface of a folding protein. Different starting 
points lead to different metastable states. Only S finds the 
global  minimum.

observes that proteins tend to employ certain stereotypical local conformations called secondary structures. The most important are called a-helices and b-sheet structures and were suggested by Pauling (Pauling et al., 1951) based on first principles. In an a-helix, the chain follows a right-handed spiral with hydrogen bonds between the amino group (NH) of one amino acid and the carbonyl group (C=O) of an amino acid a few steps further along the chain. The result is a stable structure with a sequentially local network of hydrogen bonds (see Figure 9.4A). b-sheets offer a different solution to the hydrogen bonding problem. These sheets involve segments of the chain that are sequentially distant but conformationally similar, forming an alternating pattern of hydrogen bonds (see Figure 9.4B). The b-strands may lie parallel or antiparallel to one another. In fibrous proteins, repeated amino acid sequences yield elongated a-helices like a-keratin (or hair) and b-sheets like b-fibroin (or silk). Globular proteins must contain amino acid sequences that break a-helix and b-sheet

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 243

image

Figure 9.4
(A) An alpha helix. (B) A b-sheet: four parallel  b-strands are shown. Hydrogen bonds exist
 between oxygen atoms on one strand and nitrogen atoms on the neighboring strand.

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 244

structure and cause the chain to turn back toward the center of the molecule.

Secondary structure provides a useful building block for constructing more complex protein structure (Crick, 1953; Levitt and Chothia, 1976). Proteins are usefully classified by their use of secondary structures: a/a proteins are structures dominated by a-helices (for example, myoglobin); b/b proteins are predominantly b-sheet structures (for example, plastocyanin); a/ß proteins are characterized by the regular alternation of a-helices and b-strands (for example, flavodoxin); and a + ß proteins are characterized by the irregular alternation of a-helices and b-strands (for example, lysozyme) (see Figure 9.5). Although the building blocks are common, the connectivity of the chain varies within these folding classes. Molecular biologists have borrowed the term "topology" (inappropriately) to describe the path that the chain takes in joining consecutive secondary structure elements. For example, many proteins contain four a-helices packed one against another to form a square four-helix bundle. With one helix taken as the reference point, the other three helices can be visited in six distinct orders. Moreover, each of these three helices can lie parallel or antiparallel to the reference helix. Thus, 48 motifs are possible. Is there any preference in the arrangements found in nature? By their general structure, a-helices have a dipole moment with partial positive charges near their N-terminus (start) and partial negative charges near their C-terminus (end). If electrostatic considerations are significant, one might expect to see antiparallel arrangements predominate (since opposite charges attract). In fact, a review of available protein structures reveals that 17 of 18 four-helix bundle structures conform to this expectation (Presnell and Cohen, 1989). Of the six possible motifs involving antiparallel arrangements, five have been observed in nature so far, and the sixth is expected to crop up as the database of protein structures grows (see Table 9.1 and Figure 9.6). An important corollary of the study of four-helix bundles is that quite distinct sequences can adopt similar structures: the code for folding is degenerate.

Further insight into protein structure is gained by considering the physicochemical properties of the different amino-acid side chains. Some side chains (those called hydrophilic) interact favorably with water, while others (called hydrophobic) do not. For globular proteins, one would expect (Kauzmann, 1959) that the hydrophilic side chains would tend to

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 245

image

Figure 9.5
Tertiary structure classes.

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 246

Table 9.1 Topologies of Currently Known Four-a-Helix Bundles

Number of

Overhand

All

Antiparallel

Others

Connections(s)

Left-handed

Right-handed

(right-handed)

0

Complement C3a

Cytochrome b-562

 
 

Complement C5a

Cytochrome c'

 
 

Cytochrome b5

Methemerythrin

 
 

Interleukin 2

TMV coat protein

 
 

T4 lysozyme

   

1

Ferritin

Phospholipase C (b)

Cytochrome
P-450cam

2

Human growth
hormone

NOTE: There are no left-handed topologies for "other" four-a-helix bundles. TMV is the tobacco mosaic virus.

dominate the exterior of the protein (where it interacts with the aqueous environment) while hydrophobic side chains would occupy the molecule's interior. Richards devised a simple method for defining the "solvent-accessible" portion of a protein by rolling a sphere with a radius comparable to that of a water molecule along the molecular surface (Lee and Richards, 1971). When amino acid residues are categorized in this way, it is indeed found that hydrophobic residues tend to occur on the inside and hydrophilic residues tend to occur on the outside, although the correlation is far from perfect. Solvent-accessible surface area calculations have shed light on the importance of the "hydrophobic effect" in driving protein folding and have proved valuable in dissecting the stabilization of protein—protein and secondary structure—secondary structure inter-actions.

In summary, the analysis of protein structures has produced some unassailable conclusions: packing is an important element of protein stability; secondary structure is a common component of protein structure;

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.

Page 247

image

Figure 9.6
(Top) Two left-handed bundles (side view). Three specific attributes
fully describe the topology of a four-a-helix bundle. These are the(1)
polypeptide backbone connectivity between helices, (2) unit direction
vectors of the individual helices, and (3) bundle handedness. In the
first bundle there are no overhand connections, and in the second
bundle there is one overhand connection. The handedness of a
particular bundle is determined using the "right-hand rule" of physics.
To determine if a helix bundle is of a particular handedness, orient
the thumb of one hand parallel to the first helix or helix A where the
positive unit vector stems from N-terminus to C-terminus (and
helices A, B, C, and D are the first, second, third, and fourth helices
on the path from the N terminus to the C terminus). Helix B should
be oriented to the left if it is a left-handed bundle and to the right if
it is a right-handed bundle. In the case where helix B is diagonally
opposed to helix A, the handedness is based on the position of helix
C relative to helices A and B. (Bottom) Schematic representation of
the possible antiparallel four-a-helix bundles (top view). Bold lines
represent connections n front of the page; thin lines represent
connections behind the page. Left-handed and right-handed forms of
four-a-helix bundles have an equal probability of occurrence.
Reprinted, by permission, from Presnell and Cohen (1989).
Copyright © 1989 by S.R. Presnell and F.E. Cohen.

Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 241
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 242
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 243
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 244
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 245
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 246
Suggested Citation: "BASIC INSIGHTS ABOUT PROTEIN STRUCTURE." National Research Council. 1995. Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology. Washington, DC: The National Academies Press. doi: 10.17226/2121.
Page 247
Next Chapter: THREADING METHODS
Subscribe to Emails from the National Academies
Stay up to date on activities, publications, and events by subscribing to email updates.