Page 241
individual amino acids in the protein, one can determine which amino acids lie near each other. Based on these constraints, one can use the mathematical technique of distance geometry (Crippen and Havel, 1988) or restrained molecular dynamics with simulated annealing to build a partially constrained structure. (The isotopes 13C and 15N can also provide additional information.) Currently, this approach requires a noncrystalline but highly concentrated protein solution and works only for relatively small proteins (the resonances broaden as the molecule size increases and its tumbling time decreases).
If a protein sequence contains sufficient information to code for a folded structure, it should be possible to construct a potential energy function that reflects the energetics of an assembling polypeptide chain. In principle, one would ''only" need to find the minimum of this potential function to know the protein's folded state. In practice, this goal has proved elusive.
Some early workers defined molecular force fields compatible with the experimentally measured conformational preferences of small molecules (Lifson and Warshel, 1969). Unfortunately, attempts to fold a denatured chain using this approach were unsuccessful (Levitt, 1976; Hagler and Honig, 1978) because multiple local minima along the potential energy surface trapped the folding chain in unproductive conformations (see Figure 9.3). Even with improved search strategies including molecular dynamics and Monte Carlo methods, it has not been possible to find the native structure from a random starting point (Howard and Kollman, 1988; Wilson and Doniach, 1989). This has been called the "multiple minima problem." It remains a critical problem for the conformational analysis of complex molecules. Despite the inability to fold proteins de novo, this approach has proved valuable for studying the behavior of proteins by studying small perturbations around the known structure.
Because direct computation is difficult, one approach would be to look for patternsand regularities in protein structures that might simplify the task of prediction. In fact, considerable insight can be gained by simply looking at experimentally determined protein structures. First of all, one
Page 242

Figure 9.3
The multiple minima problem: a two-dimensional schematic
of the energy surface of a folding protein. Different starting
points lead to different metastable states. Only S 2 finds the
global minimum.
observes that proteins tend to employ certain stereotypical local conformations called secondary structures. The most important are called a-helices and b-sheet structures and were suggested by Pauling (Pauling et al., 1951) based on first principles. In an a-helix, the chain follows a right-handed spiral with hydrogen bonds between the amino group (NH) of one amino acid and the carbonyl group (C=O) of an amino acid a few steps further along the chain. The result is a stable structure with a sequentially local network of hydrogen bonds (see Figure 9.4A). b-sheets offer a different solution to the hydrogen bonding problem. These sheets involve segments of the chain that are sequentially distant but conformationally similar, forming an alternating pattern of hydrogen bonds (see Figure 9.4B). The b-strands may lie parallel or antiparallel to one another. In fibrous proteins, repeated amino acid sequences yield elongated a-helices like a-keratin (or hair) and b-sheets like b-fibroin (or silk). Globular proteins must contain amino acid sequences that break a-helix and b-sheet
Page 243

Figure 9.4
(A) An alpha helix. (B) A b-sheet: four parallel b-strands are shown. Hydrogen bonds exist
between oxygen atoms on one strand and nitrogen atoms on the neighboring strand.
Page 244
structure and cause the chain to turn back toward the center of the molecule.
Secondary structure provides a useful building block for constructing more complex protein structure (Crick, 1953; Levitt and Chothia, 1976). Proteins are usefully classified by their use of secondary structures: a/a proteins are structures dominated by a-helices (for example, myoglobin); b/b proteins are predominantly b-sheet structures (for example, plastocyanin); a/ß proteins are characterized by the regular alternation of a-helices and b-strands (for example, flavodoxin); and a + ß proteins are characterized by the irregular alternation of a-helices and b-strands (for example, lysozyme) (see Figure 9.5). Although the building blocks are common, the connectivity of the chain varies within these folding classes. Molecular biologists have borrowed the term "topology" (inappropriately) to describe the path that the chain takes in joining consecutive secondary structure elements. For example, many proteins contain four a-helices packed one against another to form a square four-helix bundle. With one helix taken as the reference point, the other three helices can be visited in six distinct orders. Moreover, each of these three helices can lie parallel or antiparallel to the reference helix. Thus, 48 motifs are possible. Is there any preference in the arrangements found in nature? By their general structure, a-helices have a dipole moment with partial positive charges near their N-terminus (start) and partial negative charges near their C-terminus (end). If electrostatic considerations are significant, one might expect to see antiparallel arrangements predominate (since opposite charges attract). In fact, a review of available protein structures reveals that 17 of 18 four-helix bundle structures conform to this expectation (Presnell and Cohen, 1989). Of the six possible motifs involving antiparallel arrangements, five have been observed in nature so far, and the sixth is expected to crop up as the database of protein structures grows (see Table 9.1 and Figure 9.6). An important corollary of the study of four-helix bundles is that quite distinct sequences can adopt similar structures: the code for folding is degenerate.
Further insight into protein structure is gained by considering the physicochemical properties of the different amino-acid side chains. Some side chains (those called hydrophilic) interact favorably with water, while others (called hydrophobic) do not. For globular proteins, one would expect (Kauzmann, 1959) that the hydrophilic side chains would tend to
Page 245

Figure 9.5
Tertiary structure classes.
Page 246
| Table 9.1 Topologies of Currently Known Four-a-Helix Bundles | |||
| Number of | |||
| Overhand | All | Antiparallel | Others |
| Connections(s) | Left-handed | Right-handed | (right-handed) |
| 0 | Complement C3a | Cytochrome b-562 | |
| Complement C5a | Cytochrome c' | ||
| Cytochrome b5 | Methemerythrin | ||
| Interleukin 2 | TMV coat protein | ||
| T4 lysozyme | |||
| 1 | Ferritin | Phospholipase C (b) | Cytochrome |
| 2 | Human growth | ||
| NOTE: There are no left-handed topologies for "other" four-a-helix bundles. TMV is the tobacco mosaic virus. | |||
dominate the exterior of the protein (where it interacts with the aqueous environment) while hydrophobic side chains would occupy the molecule's interior. Richards devised a simple method for defining the "solvent-accessible" portion of a protein by rolling a sphere with a radius comparable to that of a water molecule along the molecular surface (Lee and Richards, 1971). When amino acid residues are categorized in this way, it is indeed found that hydrophobic residues tend to occur on the inside and hydrophilic residues tend to occur on the outside, although the correlation is far from perfect. Solvent-accessible surface area calculations have shed light on the importance of the "hydrophobic effect" in driving protein folding and have proved valuable in dissecting the stabilization of proteinprotein and secondary structuresecondary structure inter-actions.
In summary, the analysis of protein structures has produced some unassailable conclusions: packing is an important element of protein stability; secondary structure is a common component of protein structure;
Page 247

Figure 9.6
(Top) Two left-handed bundles (side view). Three specific attributes
fully describe the topology of a four-a-helix bundle. These are the(1)
polypeptide backbone connectivity between helices, (2) unit direction
vectors of the individual helices, and (3) bundle handedness. In the
first bundle there are no overhand connections, and in the second
bundle there is one overhand connection. The handedness of a
particular bundle is determined using the "right-hand rule" of physics.
To determine if a helix bundle is of a particular handedness, orient
the thumb of one hand parallel to the first helix or helix A where the
positive unit vector stems from N-terminus to C-terminus (and
helices A, B, C, and D are the first, second, third, and fourth helices
on the path from the N terminus to the C terminus). Helix B should
be oriented to the left if it is a left-handed bundle and to the right if
it is a right-handed bundle. In the case where helix B is diagonally
opposed to helix A, the handedness is based on the position of helix
C relative to helices A and B. (Bottom) Schematic representation of
the possible antiparallel four-a-helix bundles (top view). Bold lines
represent connections n front of the page; thin lines represent
connections behind the page. Left-handed and right-handed forms of
four-a-helix bundles have an equal probability of occurrence.
Reprinted, by permission, from Presnell and Cohen (1989).
Copyright © 1989 by S.R. Presnell and F.E. Cohen.