The complexity of an organism's proteome is in part due to the diversity of post-translational modifications present that can direct the location and function of a protein. To address the growing interest in characterizing these modifications, mass spectrometric-based proteomics has emerged as one of the most essential experimental platforms for their discovery. In searching for post-translational modifications within a target set of proteins to global surveys of particularly modified proteins within a given proteome, various experimental MS (mass spectrometry) and allied techniques have been developed. Out of 20 naturally encoded amino acids, lysine is essentially the most highly post-translationally modified residue. This chapter provides a succinct overview of such methods for the characterization of protein lysine modifications as broadly classified, such as methylation and ubiquitination.
Proteomics initially emerged with the aim of identifying and quantifying as many peptides as possible from a complex protein digest to grasp the functional significance of an organism's genome. It became evident from the outset that such technologies could also be applied towards identifying and quantifying protein PTMs (post-translational modifications) in a similarly global manner. PTMs are chemical moieties representing a diverse range of molecular masses and structural complexity that are enzymatically added to the side chains of polar, acidic and basic amino acids in both prokaryotic and eukaryotic cells. Growing interest in applying proteomics to characterize PTMs continues to be driven by increasing awareness of the diverse roles PTMs possess in normal and disease physiology, ranging from the addition of Ub (ubiquitin) by E3 Ub ligase which promotes substrate degradation, to the addition of acetyl groups by HATs (histone acetyltransferases) that serve as binding sites for transcriptional regulators and activate gene expression. As lysine is essentially the most highly post-translationally modified residue is most proteins, this chapter details recent advancements in MS (mass spectrometry) and in other related disciplines that have enabled the high-throughput analysis of various lysine PTMs.
Mass spectrometric principles in brief
MS is currently the most versatile and vital experimental platform for proteomics. Before discussing how MS may be employed for studying lysine PTMs, it is informative to discuss the general properties shared among mass spectrometers. First, all mass spectrometers measure the mass-dependent behaviour of gas-phase ions in an electromagnetic field. To do this, all mass spectrometers store and isolate ions within a particular mass/charge ratio range (m/z) as they enter the instrument. During this process, the mass spectrum is collected, and the width of the scan range determines the complexity of the ion population and precursor ion peptides being collected (Figure 1). The wider the scan range, the more ions of different m/z values will enter the mass spectrometer. By isolating ion(s) within a set m/z value, either pre-determined by the experimenter or systematically determined by the instrument on the basis of the ion abundances during the particular scan time, the mass spectrometer will then fragment these ions and collect a second or tandem mass spectrum (MS/MS or MS2) of the fragments. As discussed later for peptides, these fragments yield invaluable information concerning the primary amino acid sequence and the modified residues of the peptides. In principle, the mass spectrometer can repeatedly isolate particular fragment ion(s), fragment again, and scan those subfragments in n number of cycles for MSn acquisition.
All mass spectrometers yield information on the mass-to-charge (m/z) and relative abundance of all the ions within a given scan. Yet the mechanism of how the instrument measures m/z varies fundamentally among different mass spectrometers. For instance, a TOF (time-of-flight) mass analyser measures the time required for ions to reach the mass detector, where time is proportional to the square root of the m/z of the ions . In contrast, an Orbitrap mass detector measures the frequency of axial oscillations of orbiting ions around a curved electrode, where frequency is inversely proportional to the square root of the m/z of the ions . Finally, the mass analysers accompanying a linear quadrupole ion trap detect ions axially ejected from the quadrupole with increasing radiofrequency voltage, where the resonance voltage at which ions are scanned out of the ion trap is proportional to the m/z of the ions . Owing to these fundamental operational differences, certain experimental designs are more compatible with certain mass spectrometers. As an example, peptide ions are often introduced continuously into the ion trap, whereas ions are necessarily introduced in a pulsed manner into both TOF and orbitrap mass analysers. Consequently, ion traps possess greater sensitivity by isolating particular peptide ions predetermined to be of interest by the experimenter. Other peptide ions with different m/z values are not stabilized by the quadrupole voltages, and thus do not enter the ion trap. In contrast, TOF and Orbitrap mass analysers are more suitable to assaying all ions present in a given scan, partly due to their higher scan resolution and greater ion storage capacity.
The second characteristic shared by all mass spectrometers utilized in proteomics is actually not intrinsic to the mass spectrometer itself, but rather its interface with some form of chromatography. As most biochemical experiments occur in liquid-phase, LC (liquid chromatography) is typically used to resolve peptides from a complex sample before being introduced into the mass spectrometer in the gas phase, commonly with ESI (electrospray ionization) or MALDI (matrix-assisted laser-desorption ionization). LC provides separation of peptide ions in a dimension orthogonal to the m/z dimension, which can already be achieved with the ion-selection filters in the mass spectrometer. Common modes of separation include hydrophobicity, as in RP (reversed-phase)-LC (most commonly C18-hydrocarbon-based), hydrophilicity, as in HILIC (hydrophilic interaction LC), and pKa, as in WCX (weak cation exchange) LC. The careful application of chromatography vastly improves the dynamic range of peptides that the mass spectrometer can analyse per scan, and can provide additional important evidence to assist in the identification of PTMs. In summary, all mass spectrometers measure the m/z and relative abundance of ions in the gas phase and are typically interfaced with some form of chromatography and ionization mechanism to enable MS analysis of peptide ions in solution.
Localization of PTMs using MS
As stated previously, one obtains from MS invaluable information concerning the abundance and molecular mass of peptide ions. From the tandem mass spectrum, one obtains information concerning the sequence of those peptide ions that may be used to identify and localize particular PTMs to specific residues. Analogous to dideoxy sequencing of oligonucleotides, where one sequences from both the 5′ and 3′ direction in separate reactions using separate primers, MS sequencing operates by generating overlapping smaller peptides sharing a common N- or C-terminus. Common fragmentation methods include CID (collisional-induced dissociation), ETD (electron transfer dissociation) and HCD (higher-energy C trap dissociation). This chapter will focus primarily on CID fragmentation, owing to its more common utilization in most proteomic experiments. According to the mobile proton theory for CID fragmentation, CID is achieved by imparting vibrational energy to the isolated peptide ions by repeated collisions with an inert gas, often helium. This leads to migration of a proton either from the N-terminus, C-terminus or the side chain of a basic amino acid along the peptide backbone and results in covalent bond cleavage, typically at one of the peptide bonds in the sequence to yield two fragment ions. The actual peptide bond cleaved within a peptide sequence is not a random process, and preferential cleavage at particular positions can be incurred by many factors, including the number of basic amino acids. For example, MS sequencing of the histone H4 20–23 peptide KVLR yields the fragment ions K, KV, KVL, R, LR and VLR, with the former three peptides sharing a common N-terminus and the latter three peptides sharing a common C-terminus (Figure 1). Note that a single peptide ion stored in the ion trap cannot yield more than two fragment ions, all sharing either a common N- or C-terminus. Thus a minimum number of precursor peptide ions must be isolated and fragmented in order to yield sufficient sequence coverage and meet the instrument detection sensitivity. The mass difference between peptides sharing either a common N- or C-terminus corresponds to the next adjacent amino acid in the peptide sequence from either the N- or C-terminus. When that mass difference does not match any of the masses for the 20 canonical amino acids, the possibility of the residue being post-translationally modified should be considered.
High mass accuracy determination of the precursor and fragment peptide ions often facilitates the assignment of a specific PTM to a specific residue (Figure 2). For instance, the nearly isobaric trimethyl (+42.046 Da) and acetyl (+42.010 Da) modifications on most tryptic peptides can be generally resolved with a mass tolerance of <50 p.p.m. root mean square error. However, even when the mass difference matches perfectly with the expected mass of a modified lysine compared with an unmodified lysine, such evidence only points to PTMs containing the specific elemental composition that yields the exact mass displacement. In the aforementioned example of acetylation, a mass difference of 42.010 Da or 42.046 Da only points to a PTM of the formula C2H3O or C3H8 respectively. Other orthogonal lines of evidence are required to prove the existence and localization of the suspected modification.
Perhaps the most stringent validation is the in vivo specific metabolic labelling of the modification itself. Analogous to the classic and most rigorous experiment of 32P-radiolabelling to confirm phosphorylation, one can culture cells with a heavy isotope of the relevant metabolite that provides the source for the specific modification, for instance methionine, which is metabolized into SAM (S-adenosylmethionine), the sole precursor for protein methylation by lysine and arginine methyltransferases in eukaryotic cells . In general, most forms of LC do not distinguish between 12C and 13C or 14N and 15N, and thus the putative modified peptide should not only incorporate the heavy isotope, but also co-elute with the equivalently modified peptide with the light isotopes. Additionally, when examining the fragment ions in the tandem mass spectrum, the fragments containing the modified residue should similarly be shifted with the heavy isotope mass difference [4,5]. Thus, given these considerations, peptide sequencing with MS alone does not provide sufficient identification of a putative PTM on the peptide, which can only be achieved by additional lines of evidence such as the aforementioned labelling experiments and chromatographic behaviour discussed later.
Proteomics for analysing protein PTMs
Targeted investigations and technical challenges of lysine PTMs: histones
The fundamental facets of utilizing MS to identify and quantify post-translationally modified peptides are essentially the same for all proteomic experiments. Nonetheless, each investigation on a different target set of proteins, from a select assembly of highly modified proteins to a global interrogation of all the proteins in an organelle or cell lysate, warrants careful optimization and variable adaptation of the basic mass spectrometric techniques. Among the most well-documented proteins to be highly post-translationally modified are the histone proteins H1, H2A, H2B, H3 and H4, which are highly conserved among all eukaryotic cells. Histones are principally involved in the coiling of DNA into the nucleosome and intimately regulate gene expression. Much of the transcriptional and epigenetic regulation mediated by histones depends on both the identity and localization of specific PTMs occurring generally on the N-terminal tails, notably lysine methylation and acetylation . The chemical diversity and combinatorial occurrence of histone PTMs renders traditional assays, such as Western blotting and ELISAs with PTM-specific antibodies, unreliable to interpret and quantify.
Proteomics offers the promise of high-throughput and unambiguous identification of known and novel histone PTMs.Yet MS analysis of histone modifications is not as straightforward as one may initially expect. The standard workflow for a proteomics experiment is to first reduce disulfide bonds, alkylate the free cysteine residues, digest with trypsin, desalt and analyse by MS (also known as Bottom-Up MS). Reduction and alkylation of disulfide bonds is usually not necessary for histone PTM analysis as all of the cysteine residues occur in the C-terminal portion of the protein, well removed from the majority of modifications found on the N-terminal tails. A more critical problem though is the preponderance of lysine and arginine residues in the N-terminal tail. For instance, there are eight lysine residues and seven arginine residues in the first 50 amino acids of histone H3 alone, as well as in the first 20 amino acids of histone H4 (Figure 1). Digestion with trypsin would result in peptides less than 4 amino acids in length, which would retain poorly under most forms of LC. Furthermore, the adjacency of the lysine and arginine residues means that trypsin digestion would not reproducibly yield the same fragments. Finally, while trypsin can, in most situations, digest unmodified lysine residues, trypsin is unable to digest acetylated lysine residues due to both steric effects and the loss of positive charge on the ε-amine group, and thereby not being stabilized by the aspartate residue within the trypsin catalytic centre. Similar miscleavage events occur for mono-, di-, and tri-methylated lysine residues in order of increasing likelihood of failed digestion. Overall, the diversity of histone PTMs leads to miscleaved histone fragments depending on the modification status, a general problem for other lysine modifications.
In response to this incompatibility, several derivatization methods have been developed, with the general approach of blocking trypsin digestion at lysine residues, thereby allowing for digestion only at arginine residues (Figure 1) [7,8]. This chapter will focus on derivatization using propionic anhydride, which transfers a propone moiety (+56.026 Da) to the ε-amine group of unmodified and monomethylated lysine residues, as well as the N-terminus. After propionylation at the protein level, followed by trypsin digestion and an additional propionylation to modify the newly created N-terminus at the peptide level, the propionyl groups confer additional hydrophobicity on the histone peptides and enhance their retention in RP chromatography. In addition, by allowing for trypsin digestion only at arginine residues, the same histone peptides can be reproducibly obtained across all modified lysine states. For instance, the unmodified, acetylated, mono-, di- and tri-methylated Lys9 on histone H3 can be detected on the same 9–17 peptide (KSTGGKAPR).
The first advantage of derivatization is that one can still avail of the greater catalytic efficiency and substrate specificity of trypsin relative to other standard proteases in the experiment. The second advantage of this is that the ionization efficiencies of the various modified forms of the 9–17 peptide can be reasonably assumed to be comparable, and thus provide additional confidence in relative quantification of the modified histone forms to each other. By analogy, in real-time PCR, relative quantification of different cDNA levels using SYBR® Green dye is less reliable when the amplicons contain vastly different amplicon lengths and thus different labelling efficiencies. The third advantage is that the elution order of the various modified forms of the same propionylated histone peptide can be readily predicted on the basis of the relative hydrophobicity of the modification, providing additional evidence for assignment of methylation or acetylation to the histone peptide. The third advantage is that one can perform the derivatization scheme using an isotopically labelled propionic anhydride reagent, for instance deuterated d10-propionic anhydride, in the N-terminal capping of the histone peptides. This creates a consistent 5.029 Da offset between all light unlabelled histone peptides and the respective heavy labelled histone peptides, and allows for more rigorous quantification between histone PTM levels of different samples in a single run (Figure 3).
One may suspect that a simpler solution may be digestion with Arg-C protease which cleaves at the C-termini of arginine residues and produces the identical cleavage patterns with the derivatization schemes detailed above, which also lead to cleavage at arginine residues. One key advantage of the use of Arg-C over derivatization approaches is the documented in vivo occurrence of propionyl moieties on both histone and non-histone proteins , and the use of Arg-C could allow for the potential quantification of these modifications that would otherwise be obscured from the derivatization. Aside from the greater cost and lower enzymatic efficiency of Arg-C relative to trypsin, one important disadvantage of Arg-C is due to the higher charged states characteristic of the Arg-C generated histone peptides. Under the commonly used CID fragmentation method used for histone PTM sequencing, a more highly charged peptide leads to preferential non-random cleavage at certain residues throughout the peptide backbone and will be less informative toward PTM localization. In summary, mass spectrometric analysis of histones requires either a derivatization protocol or an alternative protease to avoid frequent and irreproducible peptide cleavages that would otherwise occur with a standard trypsin digestion.
Targeted investigations and technical challenges of lysine PTMs: heterochromatin proteins
Another example of a targeted proteomic investigation was the identification of numerous PTMs on the HP1 (heterochromatin protein 1) family members HP1α, β and γ, including lysine methylation, acetylation and formylation . As the name suggests, the HP1 members are involved in heterochromatin maintenance. For instance, HP1α recognizes and binds to trimethylated Lys9 on histone H3 (H3K9me3), and recruits SUV39H1, which is the methyltransferase that trimethylates H3K9. Consequently, via a feedback mechanism, HP1α initiates the propagation of heterochromatin. It is perhaps not surprising then that the HP1 proteins that are critical for the modification pattern of histones would themselves be highly modified.
In order to thoroughly interrogate the PTM landscape of a few target proteins such as HP1, multiple proteases with different cleavage specificities are often required for maximum sequence coverage of the target protein(s). In the case of the HP1 proteins, trypsin, chymotrypsin and Lys-C were used in separate reactions to achieve a combined sequence coverage of over 90%. As discussed previously, trypsin miscleavage often occurs for modified sterically hindered lysine residues. Other proteases targeting other residues may be appropriate depending on the target protein sequence. In addition to proteases, small chemicals with unique substrate specificities may be used, such as cyanogen bromide towards unoxidized methionine residues and N-chlorosuccinimide towards tryptophan, but these chemical cleavage methods often have lower yields and unwanted side products with respect to enzymatic approaches.
Targeted investigations of combinatorial histone PTMs
The opposite approach of interrogating the modification status of intact (also known as Top-Down MS) rather than digested proteins has also been applied with histones and high-mobility group member proteins [11–13]. A key advantage of not digesting the protein is that one maintains the connectivity of discrete modified residues within the same molecule. For instance, one can link the occurrence of H4K20 (Lys20 on histone H4) methylation with H4K5/K8/K12/K16 (Lys5/Lys8/Lys12/Lys16 on histone H4) acetylation on the same peptide, which would be impossible to determine using the propionylation protocol discussed above as both modified sites would occur on separate peptides. For a protein that contains a single modified site, this gain in information is trivial. Yet for highly modified proteins, such as histones or high-mobility group proteins, understanding the frequency and abundance of when a modification at one site is linked to another modification at a different site on the same protein could inform predictions on the regulation and function of those modifications.
MS analysis of intact proteins is often achieved from direct infusion of the purified protein sample into the mass spectrometer, which significantly reduces instrument sensitivity. A compromise between analysing small tryptic peptides or the intact protein is to digest the protein into relatively large >20 residue peptides that maintain several modified sites together, but can be sufficiently resolved using LC on the basis of the position and type of PTMs (also known as Middle-Down MS). For histones H3 and H4 this can be achieved via digestion with Glu-C and Asp-N respectively to yield the 1–50 and the 1–23 peptides respectively . For high-mobility group A1a, this can be achieved via limited trypsin proteolysis to yield the 30–54 peptide . The larger peptides/proteins generated also exist at higher charge states as tryptic peptides, and ETD fragmentation rather than CID fragmentation for tandem mass spectrum acquisition is typically used for PTM localization.
Global large-scale surveys of protein lysine PTMs: acetylation
The aforementioned examples centred on proteomic investigations targeted towards highly modified proteins that can be somewhat easily isolated and where the analytical complexity originated not from the number of unique proteins, but rather from the number of unique modified forms of a few proteins. One already has in mind a specific set of proteins from which to interrogate for the presence of PTMs. With respect to the converse, namely using proteomics to globally interrogate proteins containing specific PTMs, all investigations generally start by first enriching a complex protein or peptide sample for analytes containing the particular PTM (Figure 2). The screening of the entire proteome for modified protein candidates in contrast with analysing a known and relatively smaller set of proteins underlies the crucial technical differences between global PTM surveys and the already described targeted approaches. As alluded to, one difference is the need for enrichment and this stems from the generally low abundance or stoichiometry of the modified form of the protein compared with the unmodified form. For instance, Zhao and co-workers  used pan-lysine acetyl antibodies to identify 388 acetylation sites from 195 proteins in mammalian cells and mouse liver mitochondria. Surprisingly they discovered that a large number of these acetylated proteins derived longevity regulators and proteins involved in metabolism, implying that lysine acetylation could play a role in non-nuclear events.
Other recent surveys on global protein acetylation also utilized an anti-acetylated lysine antibody to enrich tryptic digest samples for acetylated peptides, and coupled these enrichments to higher-end MS analysis [16,17]. Such experiments were able to detect acetylation sites of low-abundance proteins, such as tumour suppressor p53, and over 700 conserved acetylated sites across three different cell lines . Interesting insights on what role acetylation may perform can be gained from applying bioinformatic analysis on the intracellular localization and binding partners of the modified proteins. For instance, acetylation was found on proteins involved in roles as diverse as DNA replication to membrane trafficking . Furthermore, a more targeted approach to investigation of acetylation of metabolic enzymes found evidence supporting a causal relationship between increased metabolic activity and increased acetylation levels of the enzymes . Thus even global large-scale catalogues on modified proteins can yield meaningful functional insights into a few target proteins, although not at a level of detail and resolution as with more targeted approaches.
Owing to the availability of a PTM-specific antibody, enrichment at the peptide level is more sensible than enrichment at the protein level. Recalling that most mass spectrometric experiments analyse peptides rather than proteins, the sample complexity is far greater in the enriched protein sample than in the enriched peptide sample.While in principle the same number of acetylated peptides will be enriched with both approaches, the background of unmodified peptides is vastly greater in the latter than in the former and will complicate MS analysis. However, as discussed below, there are situations when enrichment at the protein level is actually more sensible than enrichment at the peptide level.
Global large-scale surveys of protein lysine PTMs: methylation and formylation
Another analogous, although unique, PTM-specific enrichment has recently been developed for identifying methylated proteins, which involves performing in nucleo reactions with purified nuclei and an alkyne-containing SAM analogue . The logic is that all proteins bound by endogenous methyltransferases will receive an alkyne rather than a methyl group, which can then be clicked to an azide-containing epitope for pullout and enrichment. Thus, in the absence of a PTM-specific antibody, one could in principle add an epitope to the PTM specifically for enrichment. While PTM enrichment at the peptide level reduces sample complexity, most global surveys utilize multiple chromatographic separations to further reduce the peptide complexity. A common approach utilizes SCX (strong cation exchange) followed by RP-LC for acetylated peptide studies . Other approaches rely on gel-free isoelectric focusing to generate ‘fractions’ of peptides according to their pI values . Resolving peptides in solution rather than on a polyacrylamide matrix improves sample recovery, as one does not have to extract the peptides from the gel. Regardless of the specific technology used, the general principle is to apply orthogonal modes of separation to reduce sample complexity and hence increase the dynamic range of the MS analysis. These separations also have a benefit for global protein PTM characterization. Recently gel fractionation coupled to large-scale MS-based proteomics was used to interrogate human chromatin isolated through various biochemical strategies and over 1900 proteins were detected from these preparations. Most interestingly, over 150 of the proteins identified were lysine modified, with many of these proteins potentially being involved in transcriptional processing .
Similar global surveys have also been performed searching for lysine formylation . This modification was found across a fairly large number of chromatin-associated proteins, such as histone and HMG proteins. However, one problem with modifications such as formylation is that they may arise from the presence of formaldehyde or formic acid potentially used during sample preparation and are thus artefacts. Another problem is the similar mass shift between a formyl and a dimethyl group, and, analogous to a trimethyl and an acetyl group, high mass accuracy is necessary to resolve this difference between both possible modifications.
Labelled isotopic quantification strategies in global large-scale proteomic surveys
Once the samples are prepared for MS, another challenge is to quantify the occurrences of modified residues. Understanding the abundance of a particular PTM in various conditions can provide important insights into the enzymatic regulation of that modification. Unlike the more targeted investigations with histone acetylation, enrichment of non-derivatized modified peptides renders it difficult to normalize any detected miscleaved modified peptides with the respective shorter unmodified peptides. One approach to circumvent this issue is to apply SILAC (stable isotope labelling with amino acids in cell culture) and compare acetylation levels between samples (Figure 3). SILAC involves culturing one sample in standard unlabelled medium and the other in medium typically depleted of unlabelled lysine and arginine and supplemented with equimolar amounts of heavy isotopes of lysine and arginine . The choice of both of these amino acids allows for every tryptic SILAC peptide to incorporate at least one heavy amino acid. However, the challenge with developing the labelled culture is to incorporate the heavier isotopes into as high a percentage of the cellular proteins as possible. Once the SILAC tissue culture is sufficiently labelled, one mixes an equal cell number of both the unlabelled and labelled samples and proceeds with MS sample preparation with the mixture, with the reasonable assumption that both the unlabelled and labelled peptides will behave equivalently during the enrichment and chromatographic separation steps. Owing to the isotopic mass difference, generally greater than or equal to 4 Da, one should be able to distinguish between the unlabelled and labelled peptide signal in the mass spectrum and perform relative quantification between both samples. Examples of such experiments include the already described heavy methyl SILAC experiments, where all methionine-containing and methylated peptides will contain a 4.021 Da mass shift for every methionine/methyl group .
Global large-scale surveys of protein lysine PTMs: ubiquitination and SUMOylation
In contrast with an acetyl, methyl or formyl group, both Ub and SUMO (small Ub-like modifier) groups are extremely large PTMs, well over several kilodaltons in mass, making it somewhat difficult to analyse directly by MS. The attachment of Ub via an isopeptide bond between the C-terminus of Ub and the ε-amine group of a lysine residue on the substrate generally leads to proteosomal, lysosomal or vacuolar degradation of the substrate . The attachment of SUMO also occurs via isopeptide bond formation with lysine residues on the substrate, and is believed to antagonize substrate binding to other complexes and to oppose the addition of Ub and thus its downstream consequences . In addition to their large size, an additional difficulty in studying these modifications is their rapid turnover. With respect to Ub, turnover is achieved via the activities of E3 Ub ligase and deubiquitinating C-terminal hydrolases and, with respect to SUMO, the activities of E3 SUMO ligase and UBL (Ub-like protein) domain-containing desumoylating enzymes [22,23]. The net consequence of rapid turnover for these modifications is their exceptionally low stoichiometry with respect to the unmodified substrate and hence mandating some form of enrichment for any proteomic interrogation of both classes of modified proteins.
Enrichment strategies used in global large-scale surveys for ubiquitination and SUMOylation
The ease of genetic manipulation in the Saccharomyces cerevisiae model system has enabled a clever alternative to enrichment for ubiquitinated and SUMOylated peptides, namely by generating strains that express Ub or SUMO with an epitope tag, for instance a histidine tag [24–26]. Since both Ub and SUMO groups are cleavable by proteases, one cannot enrich at the peptide level because any N-terminal tag originally on the Ub and SUMO proteins will be removed from the substrate. Thus, unlike the global proteomic investigations on acetylation, enrichment at the modified protein rather than peptide level is required for ubiquitination and SUMOylation studies using this approach. Following epitope pulldown, one would proceed with protease digestion and multiple dimensional chromatographic separations prior to MS analysis. The subsequent bioinformatics search for ubiquitinated and SUMOylated peptides must consider both the protease miscleavage site at the modified lysine residue, and the mass shift, not from the intact Ub or SUMO moiety, but rather from the cleaved C-terminal fragment still attached after protease digestion [27,28] (Figure 2).
In contrast with generating laboratory strains expressing tagged-Ub or -SUMO proteins, another approach is to indirectly enrich for the modified proteins using an antibody-conjugated protein domain that selectively interacts with the modification. For SUMOylation, a recent study has used 32–133 RING (really interesting new gene)-finger 4 fragment, which interacts specifically with polymerized branched SUMO groups . Such an approach allows purification of SUMOylated substrates in a wider range of cell types less amenable to molecular cloning.
Because one digests away the Ub and SUMO chains to a single small fragment, one can only assay the presence and levels of total ubiquitination or SUMOylation on that residue and will be unable to understand the branched pattern of the modifications. This is a similar issue with histone PTM quantification when one cannot link different modified sites to each other on the same original molecule. One solution to assay the branching pattern is to introduce in-vitro-generated SUMO-branched fragments into the mass spectrometer . The lower sample complexity allows one to subsequently generate a reference tandem mass spectrum, parent ion charge state distribution and retention time for each branched fragment that can be matched to the actual in vivo SUMO-branched sample, thereby facilitating the bioinformatic search for SUMOylated substrates.
Future prospects for proteomics for PTM analysis
Recent work has demonstrated that MS is an ideal technique for characterization and discovery of lysine modifications, as evident from new lysine modifications still being revealed, such as lysine succinylation . Although much progress has been made towards developing and applying MS for PTM investigations, there is still a demand for better chromatographic resolution of modified peptides, more rigorous bioinformatics platforms to analyse PTMs effectively, and more efficient biochemical methods to enrich for modified proteins. Finally, even if these particular demands are met in the coming years and one can successfully identify and quantify all of the modified proteins within a cell, proteomics as a field will always need to evolve with more targeted assays, for instance site-directed mutagenesis or knock-down experiments, and even other similarly global experiments, such as microarrays, in order to achieve a truly functional understanding of lysine PTMs.
• In its most basic implementation, MS analysis of post-translationally modified peptides involves fragmentation of the peptide to yield fragment ions corresponding to the individual amino acids along its sequence. When the mass difference does not correspond to any of the 20 canonical amino acids alone, but rather, in addition, a PTM on that amino acid, this provides the first step in identifying and quantifying a modified residue.
• The initial identification of a post-translationally modified peptide or protein using tandem MS sequencing must be validated by accurate parent ion mass, tandem mass spectrum, elution order, metabolic labelling, isotopic abundances and other orthogonal lines of evidence.
• Multiple proteases must often be used to ensure as complete sequence coverage as possible for PTM interrogations on a few target proteins. Furthermore, chromatographic resolution of the peptide sample will dramatically improve the dynamic range of detecting modified analytes in a background of mostly unmodified analytes.
• Various enrichment methods for low-level modifications, such as ubiquitination and SUMOylation, have been developed and should be applied to further increase the dynamic range of detection.
• Quantification of modified peptides can be achieved label-free, in which case with respect to the unmodified peptide, or using labelled approaches such as SILAC or d5-propionyl derivatization.
We thank all members of the Garcia laboratory for constructive discussion during the preparation of the chapter. B.M.Z. is supported by the National Science Foundation Graduate Research Fellowship. B.A.G. is supported by a NJCCR (New Jersey Commission on Cancer Research) Seed grant, a National Science Foundation Early Faculty Career award, an NIH (National Institutes of Health) Innovator award [grant number DP2OD007447] from the Office of the Director, NIH, and by an NSF grant [grant number CBET-0941143].
- © The Authors Journal compilation © 2012 Biochemical Society