Research Article: Protein crystallization: Eluding the bottleneck of X-ray crystallography

Date Published: September 30, 2017


Author(s): Joshua Holcomb, Nicholas Spellmon, Yingxue Zhang, Maysaa Doughan, Chunying Li, Zhe Yang.


To date, X-ray crystallography remains the gold standard for the determination of macromolecular structure and protein substrate interactions. However, the unpredictability of obtaining a protein crystal remains the limiting factor and continues to be the bottleneck in determining protein structures. A vast amount of research has been conducted in order to circumvent this issue with limited success. No single method has proven to guarantee the crystallization of all proteins. However, techniques using antibody fragments, lipids, carrier proteins, and even mutagenesis of crystal contacts have been implemented to increase the odds of obtaining a crystal with adequate diffraction. In addition, we review a new technique using the scaffolding ability of PDZ domains to facilitate nucleation and crystal lattice formation. Although in its infancy, such technology may be a valuable asset and another method in the crystallography toolbox to further the chances of crystallizing problematic proteins.

Partial Text

Protein crystallization was observed more than 170 years ago by Friedrich Ludwig Hünefeld with the unintended crystallization of hemoglobin from earth worm blood. This accidental finding was described in his book Der Chemismus in der thierischen Organisation (Chemical Properties in the Animal Organization) in 1840 [1, 2, 3]. However, it was not until the late 19th century that scientists began to replicate the crystallization of proteins. Early protein crystallization attempts were used for purification of proteins. Scientists such as Funke in 1851 purified hemoglobin from red blood cells by dilution of red blood cells with solvents followed by slow evaporation to produce hemoglobin crystals [2, 3, 4]. Sequentially, botanists such as Ritthausen and Osborn implemented similar techniques in the 1880s through the 1890s to purify a series of plant seed proteins [2–6]. What was not realized at the time is that this accidental discovery would lend far more than the ability to isolate proteins from a sample but would become the foundation for the elucidation of high-resolution protein structure.

Protein crystallization today is achieved by the same basic principle as was discovered over 170 years ago. Supersaturation of a protein in solution is the basis behind the crystallization. At the supersaturated state, the amount of proteins in solution exceeds their solubility limit. Under this non-equilibrium state, the proteins are being pushed out of the solution undergoing a first ordered phase transition known as nucleation. Supersaturation of a protein in solution can be achieved by several different methods. Usually, a chemical known as precipitant is used to reduce protein solubility and create the supersaturation state. The phase diagram (Figure 1A) demonstrates the dependence of increasing protein and precipitant concentration on the saturated state. At both low protein concentration and precipitant concentration, the protein remains in the stable, undersaturated state. As either protein or precipitant concentration is increased in solution, the protein can undergo a transition to either the metastable, labile, or precipitation phase [2, 3, 12]. In the metastable phase, nuclei may form, which are stable compared to the parent liquid phase and metastable compared to the crystalline phase of the protein [13]. The labile phase is where both nucleation and crystal growth may occur [14]. The precipitation phase is where the highest degree of supersaturation exists, in which ordered nucleation does not occur and there is no crystal growth. Thus, crystallization is dependent on the magnitude and rate at which supersaturation is achieved.

In a typical crystallization experiment, thousands of conditions are often tested for a single protein in order to acquire a crystal suitable for X-ray diffraction. Variables that may affect crystallization include pH, temperature, and precipitant concentration. The pH is typically controlled by introducing a buffering agent into the crystallization condition. Buffering agents that are commonly used include Tris hydrochloride, HEPES, sodium cacodylate, MES, and sodium acetate. Precipitants are among the most variable factors and can be divided into four different categories based on their properties: salts, organic solvents, long chain polymers, and low-molecular-weight polymers and nonvolatile organic compounds [3]. Common salts include ammonium sulfate or sodium chloride whereas common organic solvents include ethanol and isopropanol. The polyethylene glycol family (PEG) such as PEG 3350 is representative of the third category whereas PEG 1000 or lower molecular weight PEG along with compounds such as methylpentanediol (MPD) are representative of the latter [3].

Although supersaturation is the premise behind protein nucleation and crystallization, the protein itself can be a critical variable for the formation of a crystal and subsequent growth. It has been argued that the protein, rather than the crystallization condition, may be the most important variable in the crystallization process [26]. Solubility and monodispersion of the protein are often necessary in successful crystallization experiments. Non-specific aggregation by hydrophobic amino acids or flexible protein regions can interfere with directional nucleation and overall crystal lattice formation. Therefore, protein construct optimization is often implemented in protein crystallography. During the molecular biology boom of the 1980s and 1990s, proteins that had been previously understudied due to their low abundance in the cell could now be cloned, expressed, and purified in milligram quantities using bacterial expression systems [2, 3]. However, the technology of molecular cloning would not only pave the way for the study of previously unobtainable proteins, but also would allow for manipulation of protein constructs to facilitate X-ray crystallographic studies. Standard polymerase chain reaction (PCR) and recombinant DNA technology now allow for the deletion of protein regions that may interfere with crystallization. It is common practice in construct development for protein structural analysis to remove flexible amino acid sequences [26]. These regions can be identified by a variety of techniques such as limited proteolytic cleavage followed by fragment analysis, orthological structure comparison, and multiple sequence alignment [27, 28, 29]. Removal of the flexible regions can reduce conformational heterogeneity of the protein and enhance ordered formation of the crystal lattice. For example, deletion of the N- and C-terminal residues from S. typhimurium aspartate receptor ligand-binding domain has improved crystal diffraction from 3 to 1.85 Å [30, 31]. Deletion of the N-terminal residues and an internal flexible loop from S. aureus DNA gyrase has made crystals diffract from 3 Å to 2 Å [32].

Besides removal of problematic amino acid sequences from the protein, mutagenesis of surface residues may also be implemented to enhance the formation of crystal contacts. One of the first successful examples of this strategy was that of human ferritin by Lawson in 1991 in which some surface residues were mutated to promote the crystal contacts analogous to the structure of the rat isoform [33, 34]. Subsequent studies by other groups such as McElroy in 1992 with thymidylate synthase, Zhang in 1995 with T4 lysozyme, and Zhang in 1997 with leptin showed that mutagenesis of surface residues can greatly impact the formation of the crystal lattice [35, 36, 37].

Unfortunately, even with direct construct optimization and surface modification of the target protein, crystallization success is no guarantee. Optimized constructs may still experience solubility or aggregation issues due to the improper folding of the target protein with bacterial expression systems. To circumvent these issues, molecular cloning strategies are often used to attach a solubility tag to the target protein to promote protein folding and stability. This is often accomplished by cloning the target protein into a vector that contains a protein tag which is known to fold well and exhibit substantial expression and solubility. The most common solubility tags used in crystallography experiments include Small Ubiquitin-like Modifier (SUMO), Glutathione S-transferase (GST), Thioredoxin (TRX), avidin/streptavidin tags, and Maltose Binding Protein (MBP) [28, 40–43]. Classically, once purified, these fusion tags are removed prior to crystallization using an engineered protease site in the linker region between the target protein and tag. In a sequential purification step, the tag and protease are separated from the target protein yielding the highly pure protein suitable for crystallization.

As previously explained, large protein fusion tags are commonly used in structural biology for solubility enhancement and promoting proper folding of the target protein. It is common practice to remove these tags prior to crystal screening. This is because: (1) tagged proteins are less likely to form well-ordered diffracting crystals due to conformational heterogeneity resulting from the linker region; and (2) addition of a large fusion tag lends the possibility that the native structure of the target protein is changed or physiologically relevant interactions are altered. However, because the tags are often responsible for enhancing solubility and structural integrity, removal of them from the target protein can result in unwanted complications [28]. Common problems from tag removal include precipitation of the target protein and insufficient cleavage, both of which can result in reduced protein yield or poor quality of proteins. The alternative to such issues is to leave the protein tag attached for crystallization trials [28]. Although previously thought to be undesirable, this practice, known as carrier mediated crystallography, is now being used to facilitate crystallization of proteins that have proven difficult to crystallize including membrane proteins [50–55].

Regardless of the method used to facilitate crystallization of a protein, crystal formation is limited by the laws of chemistry and the insurmountable variables involved in macromolecular interactions. When simplified, the three key components to crystallization success are nucleation, conformational stability, and ordered protein-protein contacts. With every strategy, there are strengths and limitations. However, technological advancement continues to open alternate pathways to overcome this barrier. Nanotechnology and the use of nanoparticles have been extensively explored in recent years due to its wide range of practical application in physics, optics, electronics, and even medicine [82, 83]. Nanoparticles can be defined as an ordered cluster of atoms, typically inorganic materials, that have at least one dimension between 1 and 10 nanometers. They tend to be highly reactive and have been used for conjugation to a variety of molecules with applications in protein crystallography.

With efforts to expand the crystallization toolbox, our laboratory has recently begun to develop additional approaches to facilitate protein nucleation and crystal formation. One of our approaches was designed to simulate biological scaffolding process in which a protein-protein interaction known to mediate scaffolding of protein complexes was exploited as a carrier for crystallization. Because the essence of nucleation is ordered protein interaction which results in the formation of a crystal lattice, the use of scaffolding proteins as a carrier may increase the chance of well-ordered crystal contacts. The following will describe the potential of using the scaffolding properties of PDZ domains in facilitation of crystal lattice formation.

X-ray crystallography continues to be the leading method for the elucidation of protein structure and rational drug design. However, the unpredictability of protein crystallization can significantly suppress the rate at which such discoveries are made. Although no crystallization method has guaranteed success, numerous strategies have been employed in order to increase the probability of which it can occur. Adjustment of crystallization components, modification of the protein construct, addition of carrier molecules, or even synthetic materials can be used alone or in combination to increase the odds at which the target protein can be crystallized. Additionally, utilization of the natural PDZ scaffolding ability may be implicated as an additional strategy for the induction of nucleation as well as facilitating the formation of crystal contacts. Together, all the strategies reviewed here are viable approaches which may help evade the bottleneck of crystallography and advance the analysis of protein structures.