Advanced Review Mass spectrometry-driven phosphoproteomics: patterning the systems biology mosaic Martin A. Jünger1∗ and Ruedi Aebersold1,2 Protein phosphorylation is the best-studied posttranslational modification and plays a role in virtually every biological process. Phosphoproteomics is the analysis of protein phosphorylation on a proteome-wide scale, and mainly uses the same instrumentation and analogous strategies as conventional mass spectrometry (MS)-based proteomics. Measurements can be performed either in a discoverytype, also known as shotgun mode, or in a targeted manner which monitors a set of a priori known phosphopeptides, such as members of a signal transduction pathway, across biological samples. Here, we delineate the different experimental levels at which measures can be taken to optimize the scope, reliability, and information content of phosphoproteomic analyses. Various chromatographic and chemical protocols exist to physically enrich phosphopeptides from proteolytic digests of biological samples. Subsequent mass spectrometric analysis revolves around peptide ion fragmentation to generate sequence information and identify the backbone sequence of phosphopeptides as well as the phosphate group attachment site(s), and different modes of fragmentation like collision-induced dissociation (CID), electron transfer dissociation (ETD), and higher energy collisional dissociation (HCD) have been established for phosphopeptide analysis. Computational tools are important for the identification and quantification of phosphopeptides and mapping of phosphorylation sites, the deposition of large-scale phosphoproteome datasets in public databases, and the extraction of biologically meaningful information by data mining, integration with other data types, and descriptive or predictive modeling. Finally, we discuss how orthogonal experimental approaches can be employed to validate newly identified phosphorylation sites on a biochemical, mechanistic, and physiological level. © 2013 Wiley Periodicals, Inc. How to cite this article: WIREs Dev Biol 2014, 3:83–112. doi: 10.1002/wdev.121 INTRODUCTION S ystems biology is an interdisciplinary effort. It aims at a more comprehensive understanding of biological processes by interrogating global cellular landscapes like genomes, transcriptomes, proteomes, interactomes, or metabolomes instead ∗ Correspondence to: mjuenger@ethz.ch 1 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland 2 Faculty of Science, University of Zurich, Zurich, Switzerland Conflict of interest: The authors declare that they have no conflicts of interest. Volume 3, January/February 2014 of single genes and proteins. It also attempts to increase the understanding of biological processes by considering the context of these biomolecules, i.e., their interactions in time and space in the living cell. The ambition to understand biological processes on a systems level is not new. Classical forward genetics can be thought of as systems biology in slow motion—it aims at the identification of all genetic components relevant for a specific phenotype, but mapping and characterization of the underlying genes take years of work. Modern systems biology accelerates systems level analyses by applying advanced analytical techniques that are able to capture global cellular parameters. The success of this © 2013 Wiley Periodicals, Inc. 83 wires.wiley.com/devbio Advanced Review (a) (b) Biology Biological question - conclusion - FIGURE 1 | Systems biology (a) and proteomic (b) research are interdisciplinary fields. Fruitful research in these areas depends on strong intersections and dialog between experts in biology, technology, and computation. Computation The Systems Biology Triangle emerging science critically depends on the combined expertise of biologists, analytical specialists, and computer scientists. This interdependent constellation is illustrated in the ‘systems biology triangle’ (Figure 1(a)) and ensures that challenging biological questions can be addressed that are amenable to the biological model systems and analytical techniques available, that the analysis is performed in a way that maximizes the amount and quality of the desired data, and that downstream data processing and analysis by computational means lead to results which are the basis for a meaningful answer to the biological questions asked. In the context of large-scale protein analyses, the ‘proteomics triangle’ (Figure 1(b)) displays the more specific disciplines constituting the field of proteomics. Experimentally addressing a specific biological question by proteomics usually involves enrichment of defined populations of proteins or peptides and their subsequent analysis by MS. Computational tools are then used to identify and, if desired, quantify the proteins and peptides of interest. Furthermore, modeling in the sense of capturing the acquired information, possibly integrating it with other data types such as transcriptome or metabolome profiles or protein–protein and genetic interactions, and visualization of the results can be instrumental in arriving at a biological conclusion. Corresponding to these expertise or discipline triangles, this article gives an overview about the different aspects involved in the analysis of phosphoproteomes, specifically addressing the issues of suitable biological systems and questions, sample preparation, enrichment of phosphopeptides, MS for phosphopeptide identification and quantitation, data analysis, and the range of possible validation experiments that can be employed to consolidate new biological knowledge. PROTEIN PHOSPHORYLATION IN BIOLOGICAL SYSTEMS Reversible protein phosphorylation is one of the most widespread regulatory mechanisms found within cells, and probably the most extensively studied 84 Technology Computation and Modeling Sample preparation and Mass spectrometry The Proteomics Triangle posttranslational protein modification. Virtually every cellular process is, in addition to other regulatory mechanisms, directly or indirectly regulated by protein phosphorylation. Initially characterized in 1955 on the metabolic enzyme glycogen phosphorylase,1 protein phosphorylation was later identified as a central mechanism in intracellular information processing: signal transduction cascades coordinate the cellular response to external cues such as hormones or nutrient conditions, and consist of consecutively acting protein kinases which phosphorylate downstream kinases and other substrate proteins such as transcription factors and adapter proteins when activated by the external stimulus. Protein phosphorylation occurs frequently on the side chains of serine, threonine, and tyrosine residues (O-phosphorylation), but can also occur on arginine, lysine, and histidine (Nphosphorylation). The vast majority of cellular protein phosphorylation events reported are on serine and threonine residues; while Hunter and Selfton estimated in 1980 that the ratio of S/T/Y phosphorylation is 90:10:0.05,2 recent analyses based on large phosphoproteomic studies suggest that this distribution is rather 88:11:1.3,4 Phosphorylation on histidine is thought to account for up to 6% of protein phosphorylation events in eukaryotic cells, but its analysis by standard proteomic strategies is complicated by the lability of this modification under acidic conditions.5 While presently, the true extent of phosphorylation of the proteome is experimentally not mapped out to completion in any species, a frequently cited approximation is that about 30% of proteomes in eukaryotic cells are phosphorylated.6,7 However, experimental phosphoproteome coverage based on several large-scale phosphoproteomic studies (refer to the coverage values for Saccharomyces cerevisiae, Drosophila, mouse, and human in Table 2) demonstrates that the percentage of phosphoproteins in eukaryotic proteomes is rather in the range of 40–45%. The true percentage is likely even higher, because even these advanced phosphoproteome mapping efforts are still incomplete. On the basis of extrapolations from existing phosphoproteomic datasets, the number of potential phosphorylation © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics sites in a eukaryotic cell is estimated to be in the range of 500,000 to 700,000.9,10 The Evolution and Functional Significance of Protein Phosphorylation Probably the most severe limitation in deriving meaningful biological conclusions from large-scale phosphoproteomic datasets is that of the thousands of phosphorylation sites that can be measured with current technology (see below), only a small fraction has a described function. This certainly does not mean that all the functionally unannotated phosphorylation events are nonfunctional. Evolutionary analysis of identified phosphoproteomes can facilitate focusing on the modifications that have a high probability of having a higher functional impact than others. It has been demonstrated that functional phosphorylation sites tend to be evolutionarily conserved; furthermore, phosphosites on low abundant proteins that occur with a high stoichiometry seem to be more strongly conserved than others.11 An interesting theory has been formulated for activating phosphorylation sites, which have been proposed to have evolved from acidic amino acid (‘phospho-mimicking’) residues, thereby transforming a constitutively active protein to a conditionally active one during the course of evolution.12 Phosphotyrosine signaling systems that include tyrosine kinases, phosphatases, and phosphotyrosinebinding protein domains seem to be primarily regulatory and therefore display more ‘functional density’ than the rest of the phosphoproteome.13 While tyrosine phosphorylation also occurs to some extent in simpler organisms like bacteria and yeast, pTyr signaling pathways appear to be a hallmark of more complex organisms. This notion is supported by the finding that with increasing organismal complexity (cell number per organism), the number of protein tyrosine kinases increases. Conversely, genomically encoded tyrosine content significantly decreases with complexity, possibly to reduce the amount of possible nonfunctional or deleterious pTyr events and therefore reduce noise and enhance regulation in pTyr-dependent signaling systems.14 Signal Transduction Pathways as Biological Information Processing Systems The complexity of phosphorylation-based intracellular signaling systems ranges from relatively simple, near-linear pathways to complex networks of interconnected signaling modules and feedback loops. Figure 2 depicts three simplified signaling pathway diagrams of increasing complexity. Probably the simplest signaling pathways are bacterial two-component Volume 3, January/February 2014 systems that are often mediators in virulence-related genetic regulation (Figure 2(a)). In contrast to eukaryotic cells, protein phosphorylation in bacteria was discovered relatively late, in 1979. Although S/T/Y phosphorylation has also been described in bacteria, two-component systems utilize phosphorylation of histidine and aspartate residues. As the name implies, these signaling modules consist of two components, a signal-sensing transmembrane histidine kinase (HK) which autophosphorylates on histidine residues upon activation, and then transfers the phosphate groups to aspartate residues on response regulator (RR) proteins which in turn regulate gene transcription. However, it has emerged that these simple signaling modules also exist in more complex versions that display branching and crosstalk.15 A relatively simple eukaryotic pathway is the Janus kinase/signal transducers and activators of transcription (JAK/STAT) system which is involved in cytokine signaling (Figure 2(b)). In this pathway, interferon receptors activate JAK upon ligand binding, which in turn autophosphorylate, become activated, and phosphorylate the receptor on tyrosine residues. Through Src homology 2 (SH2) domains, STAT proteins are recruited to the active receptor complex, phosphorylated on tyrosine residues, and homo- or heterodimerize in the phosphorylated state. Subsequently, the STAT complexes translocate into the nucleus and regulate cytokine target gene expression, acting as transcription factors. A more complex eukaryotic signaling system is the insulin-Target of Rapamycin (TOR) network (Figure 2(c)). The nutrient-sensing and growth-regulating TOR pathway is conserved from unicellular organisms such as S. cerevisiae to humans. In multicellular organisms, additional hormonal control of growth and metabolism is provided by the insulin/insulin-like growth factor (IGF) signaling (IIS) cascade which is interfaced with the TOR module at several nodes. The central molecular event in the IIS pathway is generation of the second messenger phosphatidylinositol (3,4,5)-trisphosphate (PIP3 ) which is catalyzed by class IA phosphoinositide 3kinase (PI3K). Elevated PIP3 levels in the plasma membrane lead to membrane recruitment of Pleckstrin homology (PH) domain-containing protein kinases such as phosphoinositide-dependent kinase 1 (PDK1) and protein kinase B (PKB)/AKT, which in turn regulate downstream effectors like forkhead box O (FOXO) transcription factors. All these pathways are composed of kinase-substrate networks, and a particularly challenging aspect of phosphoproteomics in general and signal transduction research in particular is the identification of direct protein kinase substrates. © 2013 Wiley Periodicals, Inc. 85 wires.wiley.com/devbio Advanced Review (a) (b) (c) FIGURE 2 | Signaling pathways and kinase–substrate relationships in biological systems. Three schematic pathway examples of increasing complexity are shown. (a) Bacterial two-component signaling consisting of histidine kinases (HK) and response regulators (RR) constitutes one of the simplest signaling systems, while eukaryotic cells feature networks such as the relatively simple JAK-STAT pathway (b) and the more complex insulin-TOR network (c). Only the core constituents of the pathways are shown for the sake of clarity. Interconnected cascades of direct kinase–substrate relationship form the functional backbone of these networks. Another important concept in pathway architecture is the role of phosphospecific protein binding modules like SH2 domains and 14-3-3 proteins in the dynamic regulation of phosphorylation dependent protein complexes. The concepts of approaching this issue experimentally are briefly summarized in Box 1. Crosstalk with Other Posttranslational Modifications Protein phosphorylation events can influence and be influenced by other types of posttranslational modifications (PTMs) occurring on the same protein machine. This kind of crosstalk between two or more modification systems can be either positive or negative, occur in cis (on the same protein) or trans (on two different proteins), and is comparatively well characterized in chromatin biology, more specifically on histone tails that are heavily modified and display a high density and diversity of PTMs.20 A 86 modification that is closely linked to phosphorylation is the glycosylation of serine and threonine residues by β-O-linked N-acetylglucosamine (O-GlcNAc). Basically all known O-GlcNAc-modified proteins are also phosphoproteins, and in a number of cellular responses, there is a reciprocal regulation of these two modifications.21 This dynamic interplay is not fully understood. Known mechanisms include competition of the two PTMs for the same sites, and the occupancy and cross-regulation of each other at adjacent sites or even more distant sites on the same protein.22 Furthermore, it has been reported that almost 70% of O-GlcNAc-modified proteins are also phosphorylated at tyrosine residues, suggesting that tyrosine phosphorylation might somehow impact the crosstalk between phosphorylation and glycosylation © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics BOX 1 IDENTIFICATION OF DIRECT KINASE SUBSTRATE PAIRS REQUIRES THE COMBINATION OF ORTHOGONAL EXPERIMENTAL APPROACHES To generate a solid body of biological evidence about novel kinase–substrate pairs, several orthogonal approaches have to be combined. These include computational analysis of the presence of kinase consensus motifs, in vitro phosphorylation of the putative substrate by the kinase, and recapitulation of the phosphorylation events in intact cells or even tissues. In vitro and yeast two-hybrid assays have often been instrumental in determining a direct kinase–substrate interaction. Genetic experiments include overexpression and deletion of the kinase and monitoring the phosphorylation event on the putative substrate. These are often complicated by redundancy, the rather long time frame of genetic manipulations and the resulting occurrence of indirect and compensatory effects. A specific kinase inhibitor is invaluable for the identification of direct substrates, because its action is faster than a genetic manipulation, minimizing the potential for compensation effects. Although novel substrates are not easily identified by AP-MS type experiments, some kinases bind their substrate proteins with high enough affinity to facilitate a physical co-purification, an example being the phosphorylation of FoxO1 by PKA-α.16 Therefore, epitope-tagging strategies and tools like kinobeads, that allow the physical enrichment of endogenous protein kinases,17 are useful for the elucidation of kinase interactomes and potentially the identification of new substrates. A more direct strategy makes use of genetically engineered ‘analog-sensitive’ kinases that can utilize ATP analogs which wild-type kinases cannot, and thereby attach radioactive or other labels to the substrates to be identified.18 Conventional and systems-level approaches toward linking kinases with their substrates have been reviewed by Sopko and Andrews.19 It is apparent that these types of sophisticated experiments are difficult to scale up to a high-throughput format. on serine and threonine residues.23 Another example of PTM crosstalk which is especially relevant for cellular signaling occurs within kinase consensus sequences.24 Methylation or acetylation of arginine or lysine residues commonly found in consensus motifs Volume 3, January/February 2014 can affect the phosphorylation event or vice versa. For example, PRMT1 (Protein Arginine MethylTransferase 1)-catalyzed methylation of R248 and R250 in FOXO1 (mouse numbering), the two arginine residues that define the AKT consensus motif, inhibits phosphorylation of S253 by AKT, thereby preventing the nuclear export and inhibition of FOXO1.25 Conversely, phosphorylation of RNA polymerase II (RNAPII) at S2 and S5 by positive transcription elongation factor b (P-TEFb) and CDK activating kinase (CAK), respectively, prevents its methylation at R1810 by coactivator-associated protein arginine methyltransferase 1 (CARM1) or PRMT4.26 It should be apparent that mechanistic and functional issues that arise from protein phosphorylation and its interplay with other types of modifications can only be experimentally approached if suitable techniques for the reliable measurement of the respective substrates are available. In the following sections of this review, we will provide an overview of the current methods, achievements, and challenges in MS-based phosphoproteomics. To illustrate some of the concepts which are discussed, we will frequently return to the aforementioned biological context of insulin/TOR signaling, which is a relatively well-studied system and therefore suitable to provide examples for research strategies in the field of protein phosphorylation. EXPERIMENTAL TECHNIQUES TO STUDY PHOSPHORYLATIONMEDIATED SIGNAL TRANSDUCTION Analysis of protein phosphorylation has been an important aspect of biological research for decades, and several distinct experimental approaches have evolved to this end. Figure 3 gives an overview of commonly used protocols to investigate protein phosphorylation. Traditionally, the incorporation of radioactive phosphate into substrate proteins or peptides, typically using radiolabeled ATP and in vitro kinase reactions, was utilized to characterize phosphorylation events. Kinase activity assays can be performed with a radioactivity-based readout, and phosphopeptide mapping of 32 P-marked proteins by gel electrophoresis, proteolytic digestion, two-dimensional thin layer chromatography and autoradiography has been instrumental in determining many phosphorylation sites over the years. The advent of phospho-specific antibodies marked a breakthrough in protein phosphorylation research, as they are often tools with high specificity and sensitivity that allow the detection of a specific phosphorylated amino acid residue in a protein of interest. Several © 2013 Wiley Periodicals, Inc. 87 wires.wiley.com/devbio Advanced Review techniques employ the antibody-based detection of phosphoproteins. Antibody-Based Experimental Strategies to Investigate Protein Phosphorylation Immunofluorescence and related antibody-based approaches such as immuno-electron microscopy are the only methods which can visualize the localization of the phospho-epitope in cells or tissues. Qualitative and semi-quantitative measurements can be performed by Western blotting with phosphospecific antibodies, more quantitative results can be obtained with an ELISA (enzyme-linked immunosorbent assay) setup. Protein array-type experimental setups have been used to some extent in phosphoproteome and cell signaling research,27 but are not discussed here. With the exception of immunofluorescence, the methods discussed until now detect phosphoproteins in cell lysates or other cell-free contexts, but not in individual cells or tissues. In contrast, flow cytometry can be performed with phosphospecific antibodies to measure specific modified residues in proteins with single cell-resolution.28,29 Conventional flow cytometry is performed with fluorescently marked antibodies, and is therefore limited, by the spectral overlap of the different fluorophores used, to a certain number (in the range of 10) of phosphorylation events that can be detected simultaneously. A recent development has coupled antibody-based protein or phosphoprotein detection with a mass spectrometric readout that does not suffer from the spectral overlap issue, and is referred to as mass cytometry.30,31 This approach potentially allows the simultaneous measurement and quantification of up to 100 different epitopes. Unless the phosphorylated epitopes are extracellular, the analyses by both conventional flow cytometry and mass cytometry involve fixation and permeabilization of the cells to enable antibody access to the intracellular epitopes. Although the antibody-based methods of phosphoprotein analysis summarized here have been invaluable in advancing our understanding of phosphorylation networks and can be multiplexed to a certain degree, they have two main conceptual limitations. First, phosphospecific antibodies are available for only a small fraction of well-characterized phosphorylation sites, their generation is time-consuming, and they inherently vary considerably in terms of analytical parameters such as specificity and sensitivity. Second, one obviously cannot discover new phosphorylation sites when working with antibody-based methods, but rather analyze the response or characteristics of known phosphorylation sites within a given biological 88 or biochemical context. For a global analysis of phosphoproteomes in discovery-type experiments, or the quantitative targeted analysis of specific phosphopeptides across sets of biological samples, mass spectrometric techniques have evolved as the state-of-the-art approach, as will be described below. In some cases, antibody-based enrichment of phosphopeptides has been interfaced with subsequent MS-based detection, most notably in the analysis of tyrosine-phosphorylated peptides. Mass Spectrometry of Large Phosphopeptide Pools—Phosphoproteomics The central analytical procedure in MS-based phosphoproteomic workflows is the fragmentation of isolated phosphopeptide ions in the collision cell of a tandem mass spectrometer. Several experimental and computational steps before and after the peptide fragmentation constitute the workflow which yields qualitative and in certain setups also quantitative information about the analyzed phosphopeptides. As illustrated in Figure 3, several MS data acquisition schemes can be employed, depending on the experimental goals, and either the global phosphoproteomes or defined sub-phosphoproteomes can be investigated depending on the biological question asked. In so-called shotgun or discovery-mode experiments, the mass spectrometer conventionally operates in data-dependent acquisition (DDA) mode, selecting a number of detected peptide ions, usually based on the precursor ion intensity detected in a survey scan (MS1 scan), for fragmentation in an MS2 step and subsequent identification. In this mode of operation, large catalogs of phosphopeptides identified in various cells and tissues have been generated and deposited in publicly accessible databases. Tandem mass spectrometers which can employ the scanning and mass filtering capabilities of quadrupoles, such as triple quadrupole instruments, have been used in different setups to analyze phosphopeptides. In presursor ion scanning, the first mass analyzer scans all present precursor ions, while the second mass analyzer is set to detect product ions that are diagnostic for phosphopeptides. Upon fragmentation by collisioninduced dissociation (CID), phosphopeptides usually produce a specific PO3 − ion (m/z = 79) which can be detected in negative ion mode, which is incompatible with usual proteomics workflows generally performed in positive ion mode. A derivatization strategy based on β-elimination and Michael addition has been developed which allows precursor ion scanning geared toward detection of phosphoserineand phosphothreonine-containing peptides in positive © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics FIGURE 3 | Approaches to the analysis of protein phosphorylation. Experimental categories employing radioactivity, antibody-based detection, and mass spectrometry are shown. Refer to the text for more details. ion mode.32 For phosphotyrosine-containing peptides, precursor ion scanning can also be performed for monitoring the generation of a phosphotyrosine-specific immonium ion (PSI) at m/z = 216.043 in positive ion mode. A similar strategy which can be used to detect serine-, threonine-, and tyrosine-phosphorylated peptides is referred to as neutral loss scanning and is based on the characteristic of phosphopeptides to generate the neutral phosphate fragments H3 PO4 (98 Da) and HPO3 (80 Da) during CID. In neutral loss scanning, both mass analyzers simultaneously scan through the respective m/z range at a fixed offset corresponding to the mass of the neutral loss fragment, thereby identifying phosphorylated precursor ions. Recently, data-independent acquisition (DIA) schemes have been introduced, in which the entire precursor ion population is either fragmented in one step (AIF33 ) or by repeated cycling through consecutive isolation windows (SWATH34 ). Because the link between precursor and fragment ion m/z values is lost in this type of acquisition, reliable peptide identification has to include, in contrast to conventional database searching of DDA data, chromatographic information, and entails the alignment of precisely co-eluting precursor and fragment (or only fragment) masses. Although of great potential as general proteomic workflows, these DIA approaches have not yet been routinely applied to the analysis of phosphoproteomes. In contrast to discovery-type strategies in which previously unknown phosphopeptides in the samples of interest are identified, targeted analysis by selected reaction monitoring (SRM) on triple quadrupole-type Volume 3, January/February 2014 instruments employs two mass analyzers that are set to a pair of selected m/z values at a time. These two values correspond to the precursor m/z of a given peptide and a specific fragment ion, and are referred to as a transition. Several transitions in turn constitute an analytical assay for a peptide, which can be used to detect it within a complex sample with high sensitivity and specificity.35 Some examples of phosphopeptide analysis by SRM are given in the mass spectrometry section below. A novel development for targeted proteomics implemented on the Q Exactive instrument (which consists of an orbitrap mass analyzer equipped with a quadrupole mass filter) is called parallel reaction monitoring.36,32 In this approach, the quadrupole is used to select a specific precursor ion, which is then detected with high resolution and accuracy in MS1 (single ion monitoring) and MS/MS modes (parallel reaction monitoring). Like DIA, this technology has not yet been applied to phosphoproteomics yet, but is likely to be also valuable in this type of application. More Focused Analysis of Informative Phosphoprotein Sets The phosphoproteomes of higher organisms are of such extensive complexity, that it is not yet possible to achieve a full coverage with current instrumentation and data analysis tools. First, eukaryotic proteomes display an enormous range of protein abundances that exceed the dynamic range of current LC-MS/MS instrumentation (see Section General Issues of Proteome Complexity). Second, the phosphopeptidome © 2013 Wiley Periodicals, Inc. 89 wires.wiley.com/devbio Advanced Review is much more complex than the unmodified peptidome due to the very large number of measured and estimated phosphorylation sites (see Section ‘Protein Phosphorylation in Biological Systems’). Third, many biologically important phosphoproteins such as signaling pathway components are low abundant, and many phosphorylation events are sub-stoichiometric, leading to low phosphopeptide abundances compared to the levels of the respective unmodified peptides. For these reasons, it can be of great advantage to interrogate specific subpopulations of the global phosphoproteome, which are relevant and of high information content for the biological context under investigation. The isolation of subproteomes and their selective analysis has been achieved by a number of approaches. Subcellular Fractions The large majority of large-scale phosphoproteomic studies describe phosphopeptidome profiles derived from whole cell extracts. During the lysis step, the information about subcellular localization of protein kinases and their phosphorylated substrates is lost. However, this kind of information is very valuable for the understanding of phosphorylation-based signaling, because spatial assembly of signaling modules in subcellular compartments is an important structural and regulatory element of these networks. To address this issue and to generate compartment- or organellespecific datasets, several studies have investigated the phosphoproteomes of specific subcellular compartments and structures, including the nucleus, the plasma membrane, mitochondria, endosomes, phagosomes, synaptic vesicles, and the mitotic spindle, as reviewed by Trost et al.38 Phosphoproteome analyses performed in Drosophila cells in our laboratory have focused mainly on cytoplasmic phosphorylation events, as cells were lysed in hypotonic buffer, and the supernatant of an ultracentrifugation at 100,000 g was used as starting material for the tryptic digests and phosphopeptide enrichments.39,40 AP-MS of Signal Transduction Proteins Physical enrichment and subsequent analysis of protein complexes by affinity purification-mass spectrometry (AP-MS) has proven to be a powerful approach to shed further light on intracellular signaling cascades. The limitation of this method is the rather laborious generation of high quality antibodies that can be used for the purification, or the generation of transgenic cell lines or organisms which express an epitopetagged variant of the bait protein. When a working experimental system is established, these studies can deliver two types of information—composition of the protein complex which is assembled around 90 the bait protein, and phosphorylation patterns on the bait protein itself and its interactors. In both cases, the information obtained can either be qualitative, quantitative, and static or dynamic, when e.g., different cellular conditions are compared. A recent AP-MS study identified the protein complex compositions around several nodes in the Drosophila insulin/TOR signaling system,41 and described the dynamic reassembly of node complexes upon insulin stimulation. In an investigation of the Drosophila insulin receptor substrate (IRS) homolog CHICO, affinity-purified protein complexes were subjected to an analytical strategy involving differential phosphatase treatment and quantification by iTRAQ (see Section Quantitative Phosphopeptide Measurements) to identify and quantify both the CHICO interactome and its phosphorylation patterns in response to the insulin stimulus in a single analysis.42 It has also been demonstrated that AP-MS followed by either the phosphatase-iTRAQ analysis performed by Pflieger et al., or phosphopeptide enrichment and subsequent LC-MS/MS is capable of identifying substantially more phosphorylation sites on the bait protein compared to proteome-wide phosphopeptide screening. Two examples are CHICO and the transcription factor dFOXO40 : in both cases, phosphorylation analysis of the affinity purified complex yielded a higher number of identified phosphopeptides compared to the CHICO and dFOXO phosphopeptides identified in a global phosphoproteome analysis of TiO2 , IMAC, and PAC-enriched tryptic Drosophila cell digests.39 Antibody-based enrichment (pTyr) As discussed above, the phosphotyrosine proteome is a small subpopulation of the phosphoproteome. It is highly informative and central in transmembrane and intracellular signal transduction. In contrast to phosphoserine and phosphothreonine, there are several antibodies against phosphotyrosine which are highly specific and very well suited for the physical isolation of tyrosine-phosphorylated macromolecules. Although it is also possible to isolate tyrosine phosphorylated proteins by immunoaffinity purification, the commonly used protocols for pTyr-proteomics employ an affinity purification after protease digestion, at the peptide level. A significant advantage of this procedure is that protein extraction can be performed under relatively harsh conditions, that would not be compatible with protein-AP-MS experiments (e.g., 9 M urea). Therefore, protein extraction is more efficient also for difficult populations such as membrane proteins, and subsequent coverage of the pTyr-proteome more satisfactory. An analysis of the insulin signaling pathway in differentiated brown © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics adipocytes employed a combination of stable isotope labeling by amino acids in cell culture (SILAC) labeling (see Section Quantitative Phosphopeptide Measurements) and protein immunopurification with anti-phosphotyrosine antibodies, identifying 40 insulin-regulated effectors, 7 of which had not been previously described.43 Boersema et al. combined antiphosphotyrosine peptide immunoaffinity purification and stable isotope dimethyl labeling to investigate the pTyr proteome in Hela cells. From 4 mg of starting material, more than 1,100 unique nonredundant phosphopeptides were identified, and quantitation of the cellular response to epidermal growth factor (EGF) stimulation revealed the regulation of 73 unique pTyr peptides.44 CHALLENGES AND EXPERIMENTAL STRATEGIES IN PHOSPHOPROTEOME RESEARCH General Issues of Proteome Complexity As discussed in our recent review article,36 proteomic complexity by far exceeds genomic complexity because of the presence of multiple splice variants per gene, alternative translation start sites, mRNA editing, and most importantly, the great range of protein abundances found in cells, tissues, and body fluids, which still cannot be detected in its entirety with current instrumentation in proteomic experiments. Very complex proteomes such as the human plasma proteome span a protein abundance range of more than 10 orders of magnitude45 (the proteome of a human cell spans about 7, yeast about 6, and prokaryotes about 4–5 orders of magnitude). The currently most sensitive MS techniques are able to monitor an abundance range of 4–5 orders of magnitude.46 More than 200 different types of posttranslational protein modifications have been described,47 which add an additional layer to proteomic complexity. To reduce this complexity and separate the phosphoproteome from the background of unmodified or differently modified peptides, physical enrichment strategies are usually applied to improve phosphoproteome coverage. There are several established protocols for the physical enrichment of phosphopeptides from protein digests. Antibody-based enrichment is generally only used for pTyr-containing peptides, while chromatographic and chemical enrichment is efficient for all phosphopeptides (pTyr/pSer/pThr). These enrichment steps are usually performed offline, that is, not directly coupled to the analytical reversed phase LC-MS/MS system. Figure 4 depicts the individual steps in a typical phosphoproteome analysis, from the cells or tissues Volume 3, January/February 2014 under investigation to the fully analyzed data. While gel electrophoresis is still performed in some cases of phosphoproteome studies, most large-scale screens follow a gel-free strategy, in which the protein samples are directly digested with a protease and subjected to subsequent phosphopeptide enrichment. Figure 4 also highlights the steps within the experimental pipeline during which stable isotope labeled standards can be introduced (metabolic labeling, protein labeling, peptide labeling), and the critical points which will impact the obtained phosphoproteome coverage and the overall quality of the experimental data. In the context of quantitative analyses utilizing stable isotope labeling approaches (see Section Quantitative Phosphopeptide Measurements), it should be noted that performing the labeling and sample pooling steps early in the experimental pipeline (e.g., metabolic labeling by SILAC versus chemical peptide labeling) reduces individual handling errors and therefore usually leads to more accurate quantification results. As discussed above, subcellular fractionation can be utilized to shift the focus of the phosphoproteome screen to the cellular context most relevant for the biological question asked. Different protein extraction protocols also have a strong impact on the subsets of the phosphoproteome that will be detected. Especially transmembrane and membrane-associated proteins are extracted more efficiently if denaturing agents and detergents are used during protein extraction. MS-compatible acid-labile surfactants such as RapiGest are frequently used in this context.48 After protein extraction, enrichment or separation on the protein level can be performed, e.g., by antibody-based techniques, gel electrophoresis, or size exclusion chromatography. Use of Different Proteases The vast majority of proteomic experiments is performed with trypsin as a protease. The cleavage specificity which directs enzymatic protein hydrolysis to the C-terminal side of lysine and arginine residues, confers the practical advantage that during MS, the resulting peptide ions usually carry a minimum of two positive charges, one at the protonated N-terminal amino-group of the peptide, and one at the protonated amino group of the lysine or arginine side chain at the C-terminus of the peptide. High mass accuracy instruments can discriminate the higher charged peptide ions from contaminants which are usually singly charged, and direct precursor selection for MS/MS to peptide ions by on-the-fly decision-making algorithms. During fragmentation in the MS/MS step, it is furthermore assured that all fragments of the b- and y-ion series (which propagate throughout © 2013 Wiley Periodicals, Inc. 91 wires.wiley.com/devbio Advanced Review FIGURE 4 | Experimental strategies to extend phosphoproteome coverage and introduce stable isotope-labeled reference molecules. The blue boxes on the left display the typical stages in a phosphoproteomic experiment. The boxes in the central part of the figure show experimental processes either during (mauve) or between (yellow) the stages, which can impact the success (in terms of phosphoproteome coverage and quality of quantitation) of the experiment. The orange boxes on the right illustrate at which stages stable isotope labels can be introduced. Introduction of isotope labels at an early experimental stage reduces individual sample handling errors because the samples are pooled earlier, and therefore usually leads to a more accurate quantification. the length of the peptide from both termini) carry at least one charge and are therefore detectable by the machine. Despite these advantages, limiting an analysis by default to tryptic digests will exclude a large number of phosphorylation sites, specifically those that are not located within protein sequence stretches that will generate tryptic peptides of a length suitable for MS/MS analysis. While this is currently not much of an issue for global phosphoproteomics which is already battling with complexity issues even when only considering only tryptic peptides, it does become an important consideration when analyzing selected proteins or protein complexes for novel phosphorylation events. For example, many kinase consensus sequences contain a high density of basic residues, which often lead to tryptic peptides of insufficient length. Which parts of a protein of interest is likely to be visible by a proteomic analysis of a specific proteolytic digest can be easily checked with in silico digestion tools. Experimentally, it has been demonstrated that the parallel digestion of a protein sample with proteases of fundamentally different specificities leads to an improved sequence coverage (approaching 100%) and PTM discovery rate.49 92 Chromatographic Enrichment of Phosphopeptides Various enrichment techniques have been developed that exploit the unique physicochemical properties of the phosphate group. Because of its low pKa value, ion exchange chromatography has been successfully used to enrich for phosphopeptides, namely strong cation exchange (SCX) chromatography50 as well as strong anion exchange.51 However, the enrichment is not very specific, as peptides with other acidic modifications are enriched as well. Hydrophilic interaction chromatography (HILIC), which separates analytes based on their hydrophilicity, has also successfully been used in phosphoproteomic studies52 and confers the advantage over SCX that it does not employ high concentrations of nonvolatile salts for elution, and can therefore be more easily coupled online to an LC-MS/MS system. A similar chromatographic approach using a weak anion exchange resin is electrostatic repulsion-hydrophilic interaction liquid chromatography (ERLIC), and has been used to analyze phosphorylated and glycosylated peptides in the same experiment.53 Several studies have used combined chromatographic setups employing © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics subsequent purifications to increase phosphoproteome coverage.54 Metal-based chromatography approaches to enrich phosphopeptides rely on the interaction of the negatively charged phosphate with positively charged or polarized metal atoms. Immobilized metal affinity chromatography (IMAC) uses metal ions such as Fe3+ , Ti4+ , or Zr4+ that are immobilized to the resin by chelating agents. Metal oxide affinity chromatography (MOAC) works similarly and involves metal oxides like TiO2 or ZrO2 . Several variations of the IMAC protocol have been described, including one employing two consecutive IMAC steps55 to enhance phosphoproteome coverage, and sequential elution IMAC (SIMAC)56 for the separation of monophosphorylated peptides and multiply phosphorylated peptides from complex samples. Hydroxyapatite (HAP), a naturally occurring calcium-containing phosphate compound, has also been used to enrich phosphopeptides. The electrostatic attraction between the alkaline earth metal Ca2+ and phosphate enables this purification, similarly to the other described metal-based chromatography approaches. In contrast to those, phosphopeptide purification with an HAP matrix seems to have the advantage of reduced co-enrichment of acidic peptides.57 Chemical Derivatization Although recently somewhat displaced by the metalbased affinity enrichment protocols, considerable effort has been invested in the design of chemical strategies to capture phosphopeptides. The common principle of all these approaches is the specific chemical modification of the phosphate group, followed by the attachment of an affinity handle that facilitates the physical enrichment of the phosphopeptides. As there is no single chemical reaction that is exclusively specific for the phosphate group, several approaches have been developed.58 To minimize side reactions with, e.g., acidic groups on the C-terminus or side chains of acidic amino acid residues, protection steps such as carboxylic acid methylation are usually employed prior to the chemical targeting of the phosphate groups. A widely used chemical derivatization protocol is based on the combined β-elimination (BE) under strongly basic conditions, followed by a Michael addition which attaches an affinity tag and optionally a stable isotope label for quantification. The drawback of the βelimination approach is that it is only able to capture serine- and threonine- (not tyrosine-) phosphorylated peptides and displays a substantial amount of side reactions.59 The other chemical approaches are applicable to pS, pT, and pY-containing peptides. Volume 3, January/February 2014 The second well-established chemical strategy is based on phosphoramidate chemistry (PAC), which entails reaction of the phosphate group in a phosphopeptide with primary amines to form a phosphoramidate. Affinity handles and isotope labels can be attached according to this strategy in various ways. One example is the protection of carboxyl groups by methylation and concomitant isotope labeling, followed by the carbodiimide-catalyzed reaction of the phosphate groups with cystamine and subsequent reduction to generate free thiol groups on every phosphorylation site, which are then physically captured with maleimide-functionalized glass beads.40 Phosphopeptides were then eluted with trifluoroacetic acid (TFA) and analyzed by LCMS/MS. Other chemical methods that are not based on BE or PAC, such as diazo chemistry60 or oxidationreduction condensation61 have also been described but are so far not widely used in phosphoproteomic research. Gas Phase Ion Separation Phosphoproteome coverage and phosphopeptide identification can be improved by an additional separation technique which is applied to the ionized peptides in the gas phase, at the interface between the LC and MS steps in the analytical pipeline. A conceptual weakness of shotgun-type LC-MS/MSbased phosphoproteomics concerns the analysis of phosphoisomeric peptides. These peptides of identical sequence which display a phosphorylation on different residues are often difficult to separate in the LC step. In the MS1 step, they cannot be resolved either, because they are isobaric, and therefore are fragmented together during CID. This results in MS2 spectra derived from a mixed population of precursor ions which complicates unambiguous phosphorylation site assignment. These phosphoisomers are estimated to account for 3–6% of the total detected phosphoproteome in large-scale studies, and their identification can be substantially improved by targeted LC-MS/MS analysis.62 Ion mobility separation (IMS) can be employed to physically separate these phosphoisomeric peptide ions in an analytical device operating between the ionization and MS steps in the LC-MS/MS setup (LC-IMS-MS/MS). It furthermore facilitates the removal of chemical contaminants from multiply charged peptide ions, enhancing spectral quality in MS/MS. IMS separates ions in the gas phase based on their differential mobility in a range of physical devices. Specifically, differential or high field asymmetric waveform IMS (FAIMS) has been established as a postionization separation technique © 2013 Wiley Periodicals, Inc. 93 wires.wiley.com/devbio Advanced Review that is orthogonal to MS in the context of peptide analysis. In FAIMS (also called differential ion mobility spectrometry, DMS), peptide ions travel through a carrier gas (or drift gas) at atmospheric pressure within a drift tube between two electrodes, across which a high-voltage asymmetric waveform at radio frequency (RF) is applied. This electric field is called the dispersion field and oscillates between high and low electric field strength because of its asymmetric waveform. Only ions with a specific mobility will be balanced and move through the drift tube without hitting one of the electrodes and being neutralized, while ions with substantially different mobility in high and low electric field will drift toward one of the electrodes during their travel through the device. This ion drift toward an electrode can be counteracted by the application of a small DC voltage applied to this electrode (and superimposed on the RF field), which is referred to as the compensation voltage (CV). The CV required to keep a specific ion stabilized is dependent on the mobility of this ion and therefore characteristic of it. When analyzing ion mixtures, the CV is scanned, and the mixture thereby separated into peaks of ions of distinct mobility, because each ion species can only travel through the space between the FAIMS plates at a specific CV which stabilizes its flight. The resulting spectrum is called a CV spectrum. In phosphopeptide analysis, FAIMS was demonstrated to be successful in separating phosphoisomeric peptides even in cases in which the phosphorylation site was shifted by only one residue, which is usually very challenging with conventional LC-MS/MS setups.63 Also for more complex phosphopeptide mixtures like TiO2 -enriched cellular phosphoproteomes, FAIMS can be instrumental in resolving chromatographically coeluting phosphoisomers and substantially enhancing phosphoproteome coverage. Bridon et al. used a FAIMS interface coupled to an LTQ-Orbitrap instrument to analyze the phosphoproteome of Drosophila S2 cells. A combination of FAIMS separation and decision tree fragmentation (discussed below) resulted in a 50% increase in the number of identified unique phosphopeptide species compared to conventional LC-MS/MS analysis.64 In the same workflow, the authors performed label-free quantification to identify insulin-regulated phosphopeptides. As discussed above, several different MS data acquisition strategies have been developed which are suitable for phosphoproteomic experiments, and these can be categorized into discovery-type (like DDA) and targeted approaches (like SRM). In the section about MS of phosphopeptides below, the advantages and limitations of various peptide fragmentation techniques will be addressed, while 94 post-MS computational data processing is presented in a dedicated Computational Tools Section below. Normalization to Protein Levels Large-scale phosphoproteomic studies suffer from the conceptual limitation that in the absence of proteome-wide quantitative data about protein levels, quantitative phosphoproteome measurements cannot distinguish between changes in phosphorylation level and protein level, simply because they exclusively interrogate phosphopeptides. If a certain phosphopeptide increases or decreases its abundance upon a certain stimulus or cellular condition, this change may be caused by an alteration of the degree of phosphorylation at the respective site, but also by changes in the abundance of the phosphorylated protein. The group of Steven Gygi investigated this problem in a systematic manner. Quantitative proteome and phosphoproteome analysis of FUS3 or STE7 mutant yeast strains compared to wild-type strains revealed that 25% of the observed phosphorylation changes were indeed attributed to protein expression levels and not to the regulation of the phosphorylation events.65 This limitation has to be kept in mind when interpreting large-scale phosphoproteomic datasets. It is often assumed that in contrast to the comparison of different cell types, the analysis of short-term manipulations such as stimulation with hormones or brief treatment with kinase inhibitors will minimize the impact of protein levels on the phosphoproteome profiles. However, if novel phosphorylation sites are later selected for validation by further experiments, the issue of protein versus phosphorylation level should certainly be addressed. MASS SPECTROMETRY OF PHOSPHOPEPTIDES Expanding on the experimental workflow for a typical phosphoproteomic experiment outlined in Figure 4, Figure 5 zooms into the central part of mass spectrometric analysis and presents a more detailed view of this critical step. In many respects, the mass spectrometric analysis of phosphopeptides follows the same patterns as conventional MS-based proteomics with unmodified peptides. The tandem MS measurements detect intact peptide precursor ions, fragment them by various means, and subsequently measure the masses of the resulting product ions. The available acquisition schemes correspond to those used in all proteomic disciplines, with the exception of specialized scanning techniques such as neutral loss scanning or precursor ion scans for phosphoaminoacid-specific immonium ions. © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics FIGURE 5 | Mass spectrometry of phosphopeptides. The different elements of mass spectrometric analysis of (phospho-) peptides are shown. The central element is the fragmentation of peptides which generates sequence information and is therefore instrumental in identifying peptides in a biological sample. Different fragmentation strategies used in phosphoproteomics are displayed, as are the diverse acquisition schemes that have already been introduced in Figure 3. DDA: Strategies and Fragmentation Techniques Most phosphoproteomic studies that were geared toward the identification (and sometimes quantification) of large numbers of phosphopeptides have been performed on instruments like the LTQ-FT or the LTQ-Orbitrap, in which precursor masses are measured in a high resolution and accuracy analyzer such as a Fourier transform ion cyclotron resonance (FT-ICR) cell or an orbital trap, fragmentation is performed by CID inside the linear ion trap, and the resulting fragment masses either measured with relatively low resolution in the trap itself or in the high resolution analyzer. The use of high accuracy measurements in the MS1 step has been demonstrated to significantly improve the quality and reliability of phosphopeptide identifications.66 An important issue for the MS2 step in phosphoproteomics is the lability of the phosphoester bond in peptides phosphorylated on serine or threonine residues. Upon fragmentation by CID, these phosphopeptides (and not tyrosine-phosphorylated peptides) easily lose neutral phosphate, a phenomenon that is exploited by the neutral loss scanning strategy mentioned earlier. This Volume 3, January/February 2014 behavior is problematic, because it leads to fragmentation spectra that display a very prominent fragment ion peak that represents the neutral loss of phosphate, and b- and y- sequence ion series of much lower intensity. This decreases the information content of the fragmentation spectrum, which is needed for reliable peptide identification and phosphorylation site assignment. Different measurement techniques have been developed to alleviate this issue while still performing CID in the linear ion trap. Two examples of such strategies are MS3 and multistage activation (MSA). In MS3 (in this context more specifically referred to as data-dependent neutral loss MS,3 DDNLMS3 ), the neutral loss peptide ion is fragmented again to induce a higher degree of backbone fragmentation and sequence information.67 The drawback is that sequence ions which were generated in the first fragmentation step are lost when isolating the neutral loss peak for the second step. MSA (or pseudo MSn ) addresses this issue by omitting the ion isolation step between MS/MS and MS.3 The neutral loss fragment is collisionally activated while the fragment ions from the MS/MS step are still present in the trap. This procedure results in a hybrid fragmentation spectrum that © 2013 Wiley Periodicals, Inc. 95 wires.wiley.com/devbio Advanced Review contains a higher number and intensity of structurally informative ions than the individual spectra generated in the MS3 approach.68 Acquisition speed is enhanced as well, because the trap does not need to be refilled for a second isolation step. Another issue related to fragmentation is not directly related to phosphorylation, but to the fact that phosphoproteomic datasets contain a significant population of proline-containing peptides due to the activity of proline-directed kinases.10 The peptide bond amino-terminal to proline is particularly labile in CID (‘proline effect’), complicating peptide identification.69 A distinct fragmentation mode can be performed in the multipole collision cell of LTQOrbitrap instruments. In contrast to the routine setup in which high-resolution MS1 scans are performed in the orbitrap analyzer and low-resolution MS2 scans in the linear ion trap (‘high–low’ strategy), it has been demonstrated to improve phosphopeptide measurements when a ‘high–high’ strategy is applied in which each MS1 scan is followed by several higher energy collisional dissociation (HCD) events and subsequent measurement of the fragment ions in the orbitrap with high resolution.70,71 The refinement of database search engines should be adapted to this kind of fragmentation data, as some, but as yet not all engines actually exploit the high accuracy MS2 information to generate more robust peptide identifications.72 When analyzing samples of moderate complexity such as protein complexes, where acquisition speed to improve coverage is less of an issue, it can be beneficial to consecutively perform several different fragmentation steps on phosphopeptide precursor ions. Przybylski et al. applied a combination of iTRAQ labeling, discovery, and subsequent targeted (inclusion list-based) measurements and a set of different fragmentations (CID/MS2, MSA and HCD) on all targeted precursors to optimize phosphopeptide identification and quantification on proteins of interest.73 The FragMixer software tool was specifically designed to optimize peptide identification and localization of phosphorylation sites based on the combined analysis of spectra generated by different fragmentation modes on the same phosphopeptide precursor ions.74 Finally, it was recently reported that HCD fragmentation of phosphopeptides also features a distinct advantage in the determination of phosphorylation sites. On an LTQ-Orbitrap, a newly discovered fragmentation mechanism during HCD was shown to generate a neutral loss-derived x-fragment ion which directly pinpoints the modified residue.75 Electron-driven dissociation methods to dissociate peptide ions represent an alternative fragmentation strategy for phosphoproteomics, one that is complementary to CID and especially well suited for the 96 analysis of highly charged phosphopeptide ions with a low m/z values (high charge density). Electron capture dissociation (ECD) and electron transfer dissociation (ETD) employ direct electron capture or electron transfer from a singly charged radical donor anion (such as anthracene, fluoroanthene, and azobenzene) to dissociate the precursor ion, respectively. Unlike CID, which generates b- and y-fragment ions, ECD and ETD result in the formation of c- and z-type fragment ions, and the phosphate group is usually retained on the fragment without the occurrence of a neutral loss. This greatly facilitates peptide identification and the precise determination of phosphorylation sites.76 A recently described strategy termed decision tree fragmentation combines the two complementary fragmentation methods CID and ETD on a single instrument in a way that seeks to maximize phosphoproteome coverage in complex samples (where acquisition speed is a crucial issue). On a modified LTQ-Orbitrap machine equipped with an ETD module, the decision tree algorithm makes on-thefly decisions of how to fragment peptide precursor ions, based on charge state and m/z which are determined during the high resolution MS1 scans in the orbital trap analyzer. Precursors with lower charge states and higher m/z values are fragmented by CID, while more highly charged precursors with lower m/z are dissociated by ETD. The authors demonstrated that this decision tree fragmentation scheme leads to the identification of 7,422 phosphopeptides in human embryonic stem cell phosphopeptide samples, compared to either 2,801 (CID) or 5,874 (ETD) when using only one fragmentation method.77 Other modes of fragmentation that have been applied to phosphopeptide analysis and are displayed in Figure 5 include the electron-driven methods electron detachment dissociation (EDD) and metastable atom-activated dissociation (MAD), photodissociation approaches and MALDI post-source decay (PSD). For further reading, we refer to a recent review article about MS/MS strategies for phosphoproteomics.76 Quantitative Phosphopeptide Measurements With the exception of a few specific issues which are unique to the field of phosphoproteomics, the quantification strategies for phosphopeptide measurements largely correspond to those employed in conventional peptide-centric (bottom-up) proteomics. As these have been recently reviewed extensively,78 we confine this section to delineating the general concepts of quantification without referring to specific examples of application. Figure 6 displays different approaches to the quantification of peptides in MS-based proteomics and phosphoproteomics. They can be divided © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics FIGURE 6 | Quantitation strategies in phosphoproteomics. The triple color coding represents the three distinct analytical stages of LC (blue), MS1 (green), and MS2 (orange) during LC-MS/MS. The different quantitation schemes are colored according to which kind of information they exploit for quantitation. The yellow boxes on the left indicate categories such as label-free methods or stable isotope labeling that are commonly used for the different quantitation strategies. Note that in contrast to spectral counting, SIL techniques can also be combined with strategies that are also used for label-free quantification. For example, chemical tags with a mass shift such as ICAT or ICPL, or sample preparation methods employing metabolic labeling or synthetic heavy peptides can be used in conjunction with MS1-based quantification or SRM, and isobaric tags like iTRAQ or TMT can be combined with SRM. into two broad categories: strategies utilizing stable isotope labeling (SIL) and label-free quantification methods. SIL can be performed either by spiking labeled synthetic reference peptides into the biological samples to be quantified, by metabolic protein labeling, or by chemical labeling of proteins or peptides. Label-free approaches either use the number of MS/MS spectra identifying a given peptide, which correlates with its abundance (spectral counting), or use the precursor ion signal intensity in combination with chromatographic information. SRM uses two physical mass filters and utilizes information on the LC, MS1, and MS2 level. As MS1- and SRM-based quantification strategies rather correspond to general acquisition schemes, they can also be combined with SIL approaches to generate a compound quantification strategy. In general, quantification employing SIL is more accurate than label-free quantification. The color code shown in Figure 6 illustrates which levels of information are exploited by which method to generate quantitative evidence on peptide levels. SRM is the only method which uses information on all three levels (as do DIA schemes such as SWATH-MS, which are not included in this figure because they have Volume 3, January/February 2014 not been applied to phosphoproteome profiling yet). MS1-based quantification and the SIL approaches employing synthetic reference peptides, metabolic protein labeling, and chemical non-isobaric tags utilize LC and MS1 information. The most commonly used metabolic labeling strategy is SILAC, which utilizes isotopically labeled amino acids in the cell culture medium to generate ‘light’ and ‘heavy’ proteome samples for relative quantification.79 Non-isobaric chemical tags include isotope-coded protein labels (ICPL)80 and mTRAQ reagents.81 Isotope-coded affinity tag (ICAT)82 approaches are not considered compatible with phosphoproteomic analyses, as they interrogate cysteine-containing peptides, which are relatively rare and therefore do not deliver the protein sequence coverage needed for PTM analysis. Quantification based on the alignment of MS1 features typically uses MS2-derived peptide identifications to annotate the chromatographic peaks with the corresponding peptide identities, but do not use MS2 information for quantitative purposes. The only methods which utilize only MS2 information are spectral counting and isobaric tagging approaches, which generate quantitative information based on reporter ions produced © 2013 Wiley Periodicals, Inc. 97 wires.wiley.com/devbio Advanced Review upon fragmentation in the MS2 step. These reagents include isobaric tags for relative and absolute quantification (iTRAQ)83 and tandem mass tags (TMT).84 A recent study suggests that iTRAQ labeling is superior to mTRAQ for the quantitative profiling of global phosphoproteomes.85 An important issue in which quantitative information in phosphoproteomics is crucial is the determination of phosphorylation stoichiometries. For single proteins, this has been achieved in several cases with SRM-type strategies.86,87 Wu et al. have established a strategy based on phosphatase treatment and SIL to determine phosphorylation stoichiometries on a global scale. The utility of the approach was demonstrated by determining the stoichiometries for 5,033 phosphorylation sites in yeast.88 Targeted Mass Spectrometry of Phosphopeptides (SRM) A number of studies have described the targeted measurement of phosphopeptides by SRM/MRM. Being by definition a non-discovery method in which the mass spectrometer simply records signals according to coordinates that have been a priori set by the user, it is usually performed to quantify sets of known phosphopeptides in a sensitive and reproducible manner across many biological samples. For small numbers and single proteins, SRM can also be applied as a hypothesis-driven discovery type approach, when transitions are designed to monitor putative phosphopeptides on a specific protein and to thus exhaustively explore the space of possible phosphorylation events on it. Several studies have utilized this kind of measurement to screen for phosphorylation sites with an SRM-MS2 strategy, in which acquisition of a signal with phosphopeptide-specific transitions triggers the acquisition of a full MS2 spectrum to reliably identify the phosphopeptide.89,90 Zappacosta et al. used a combination of isotope labeling, precursor ion scanning for PO3 − in negative ion mode, and SRM to analyze phosphorylation of the yeast transcription factor Pho4.91 Glinski and Weckwerth performed phosphopeptide SRM to monitor phosphorylation of a small synthetic peptide library corresponding to trehalose-6-phosphate synthase (TPS) isoform peptides by Arabidopsis thaliana leaf protein extracts.92 In human breast cancer tissue samples, SRM was employed to validate 15 regulated phosphopeptides selected from a quantitative phosphoproteome comparison of low- and high-risk recurrence groups.93 A study addressing the suitability of archival clinical cancer samples reported the successful SRM measurement of 18 phosphopeptides derived from fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) 98 cancer tissue samples.94 On a larger scale, WolfYadlin et al. have used SRM to monitor 222 distinct phosphotyrosine-peptides across seven time points following EGF treatment of cells.95 Sherrod et al. have described a pseudo-SRM approach implemented for phosphopeptide analysis on a linear ion trap, which entails acquisition of full MS2 spectra of peptides specified in an inclusion list, and subsequent computational extraction of SRM-like traces specific for the peptides of interest with the Skyline software tool. The authors report the quantification of six phosphopeptides derived from immunoprecipitated EGF receptor (EGFR).96 While certainly a viable strategy for targeted quantification, this ‘pseudo’-SRM which relies on a posteriori computational extraction of SRM-like information from full-scan data most likely does not provide the excellent sensitivity of ‘real’ SRM, which in contrast applies actual physical mass filters on the precursor and fragment ion levels, which are instrumental for achieving high sensitivity measurements in complex backgrounds. COMPUTATIONAL TOOLS IN PHOSPHOPROTEOMICS Like conventional proteomics, the analysis of phosphoproteomes is heavily dependent on a range of computational tools. A representative collection of these tools is summarized in Table 1. For a detailed explanation of computational analysis of phosphoproteomic datasets, please refer to a recent review article by Ren et al.138 Computational analysis is required at several points in the data analysis workflow. The earliest point at which specialized software is required (with certain exceptions like the decision tree algorithm governing fragmentation decisions during MS/MS) is the one of database searching and subsequent assignment of peptide identification probabilities. Two-step search strategies such as the refinement search option in X! Tandem, in which a limited database is created based on the proteins identified in the first round, which is then searched against with more variable modifications being considered, can help to identify modified peptides without leading to an explosion of the search space. However, care must be taken with respect to different false discovery rates in comparison to one-step database searches.139 The Ascore algorithm operates after the database search and assigns a statistical probability for a phosphorylation site assignment of being correct. The SloMo algorithm is an adaptation of this site assignment tool for ETD data. To make the results of large-scale phosphoproteome studies publicly available and to © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics TABLE 1 Computational Tools for Phosphoproteomics Name Description PubMed ID (Reference) Phosphorylation site identification and assignment Ascore Probability-based algorithm for phosphorylation site localization in high-throughput proteomic datasets 1696424397 SLoMo Adaptation of the Ascore for electron transfer dissociation (ETD) data 1927524198 FragMixer Automated identification of phosphopeptides and phosphorylation sites based on multiple fragmentation modes in MS/MS 2309486674 PTMProphet Localization of PTMs in modified peptides, integrated into the transproteomic pipeline (TPP) Mascot Delta Score Localization of phosphorylation sites based on peptide identifications generated by the Mascot search engine 21057138100 LuciPHOr Phosphosite localization based on TPP-processed MS/MS data, providing estimates for the false localization rate (FLR) n/a (in review) PhosphoScore Phosphorylation site assignment tool compatible with data from multiple MS levels (MSn ) 18543960101 MSQuant Comprises phosphorylation site scoring feature compatible with Mascot search results 170819833 Inspect Contains scoring function to improve phosphopeptide identification from unassigned MS/MS spectra 18563926102 Phosm Contains PhosphoSiteScore feature for site assignment, suited for in-depth analysis of small datasets, also works for unassigned MS/MS spectra 17718535103 n/a.99 Publicly accessible databases for phosphorylation sites PhosphoPep MS-based phosphopeptide database for Saccharomyces cerevisiae , Caenorhabditis elegans , Drosophila melanogaster , and Homo sapiens 21082442104 PHOSIDA Posttranslational modification database containing phosphorylated, acetylated, and N -glycosylated peptides 18039369105 Phospho.ELM Database of experimentally verified phosphorylation sites in eukaryotic proteins 21062810106 PhosphoPOINT Integrates human phosphoproteome datasets with human kinase interactome networks 18689816107 PhosPhAt Depository for Arabidopsis thaliana phosphorylation sites 19880383108 P3 DB Plant Protein Phosphorylation DataBase, contains phosphorylation sites for six plant species 18931372109 SysPTM Comprehensive posttranslational modification database containing datasets covering nearly 50 different PTMs 19366988110 HPRD Human Protein Reference Database, contains data about domain architecture, PTMs, interaction networks and disease association 18988627111 PhosphoSitePlus Posttranslational modification database from Cell Signaling Technology, primarily human and mouse proteins 22135298112 PhosphoNET Phosphorylation site database from Kinexus for human proteins, contains experimentally validated as well as predicted sites n/a (corporate) Sequence pattern recognition in phosphoproteomic datasets Motif-X Extracts overrepresented patterns from any sequence dataset through iterative comparison to a dynamic statistical background 16273072113 Scan-X Detects phosphorylation motifs (identified by Motif-X) within any sequence dataset 18974045114 HPRD PhosphoMotif Reports the presence of known phosphorylation-based substrate and binding 17344875115 Finder MoDL motifs curated from the literature Motif Description Length, discovery of motif mixtures for uncharacterized kinases and phosphatases in phosphoproteomic datasets Volume 3, January/February 2014 © 2013 Wiley Periodicals, Inc. 18996944116 99 wires.wiley.com/devbio Advanced Review TABLE 1 Continued Name Description PubMed ID (Reference) SMALI Scoring Matrix-Assisted Ligand Identification, identification of phosphopeptide ligands that are likely to bind to SH2 domains 17956856117 , 18424801118 Prediction of phosphorylation sites and kinase–substrate relationships NetPhorest Atlas of linear kinase consensus motif and phosphorylation-dependent binding domains, interfaces to Scansite, Phospho.ELM, and PhosphoSite 18765831119 NetworKIN Prediction of in vivo kinase–substrate relationships, integrating cellular and molecular contexts for kinases and phosphoproteins 17570479120 NetPhosK Kinase-specific prediction of eukaryotic protein phosphorylation sites 15174133121 Predikin Prediction of substrate specificities of protein kinases, suitable for proteome-wide predictions 21829434122 Scansite Prediction of kinase–substrate and cell signaling interactions based on short sequence motifs 12824383123 GPS Group-based Prediction System, kinase-specific phosphorylation site predictor 18463090124 Minimotif Miner Detection of approximately 300,000 distinct short sequence motifs in protein sequence queries 22146221125 KEA Kinase Enrichment Analysis, draws upon data from NetworKIN, Phospho.ELM, MINT, HPRD, PhosphoPoint, SwissProt, and manually curated data 19176546126 KID Kinase Interaction Database for yeast proteins, literature-curated depository of kinase–substrate pairs 21492431127 Network analysis and protein classification Cytoscape Modular software package for visualizing molecular interaction networks and integrating them with other data types 21149340128 STRING Search Tool for the Retrieval of Interacting Genes, interaction database for more than 1100 sequenced organisms 21045058129 BioGRID Biological General Repository for Interaction Datasets, contains protein–protein and genetic interactions from large- and small-scale studies 21071413130 MINT Molecular INTeraction Database, contains roughly 235,000 binary protein–protein interactions captured from over 4750 publications 22096227131 IntAct Molecular interaction database containing data interactions from the literature or direct data depositions 22121220132 PANTHER Protein ANalysis THrough Evolutionary Relationships, protein classification based on gene families, GO classes and pathways 12520017133 PhosSNP Database of nonsynonymous single-nucleotide polymorphism (nsSNPs) that potentially influence protein phosphorylation in human cells 19995808134 PepCyber: P∼PEP Database of protein–protein interactions mediated by phosphoprotein-binding domains (PPDBs) in human cells 18160410135 OpenMS Modular software package for the analysis of mass spectrometry-based proteomic data 21063960136 Skyline Software tool for performing quantitative proteomic experiments either in selected reaction monitoring (SRM) or full-scan mode 20147306137 Additional resources The tools displayed in this table tackle various challenges in the analysis and dissemination of phosphoproteomic data. See main text for details. facilitate data mining, repositories for protein phosphorylation datasets are an important part of the computational curation in phosphoproteomics. Large and widely used databases include PhosphoPep, PhosphoSitePlus, PHOSIDA, and Phospho.ELM. These 100 resources differ substantially in what amount of data they contain for each given phosphorylation event. In PhosphoPep, consensus fragmentation spectra can be accessed, but the raw MS data or mzXML files are usually not accessible in the public databases, unless © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics they have been deposited in e.g., PRIDE or Tranche upon publication of the respective research paper. The value of accessible raw data files is illustrated by the recently reported identification of ADP-ribosylation sites via re-searching of a published phosphoproteomic dataset.140 Furthermore, a number of specialized tools address the assignment of putative upstream kinases to experimentally identified phosphorylation sites, or to predict phosphorylation sites based on consensus motifs in an input protein sequence. These tools comprise NetPhorest, NetworKIN, Scansite, and others. Related algorithms such as Motif-X and ScanX can be used to recognize overrepresented sequence motifs surrounding phosphorylation sites in phosphoproteomic datasets, facilitating the identification of putative novel kinase recognition sequences involved in a specific biological context. In addition, there are several computational resources which are not specific to phosphoproteomics, but are more generic tools that can be used for visualization of large-scale networks derived from various datasets. These include Cytoscape, for mapping of phosphoproteome profiles onto pre-existing (mostly protein–protein interaction) networks, or for structuring phosphoproteome datasets according to gene ontology (GO) annotations. Hyperlinks to the individual database and software tool Web sites, and the respective literature references in PubMed are provided in Table 1. PHOSPHOPROTEOMICS IN THE CONTEXT OF SYSTEMS BIOLOGY Choice of Appropriate Biological Questions The success of a phosphoproteomic analysis strongly depends on the experimental system used and the biological question—if any—asked. Pure cataloging projects usually do not address a specific biological question, but still, the outcome of the investigation depends on parameters like available amounts of starting material, protein extraction conditions, phosphopeptide enrichment, and downstream experimental protocols. As delineated above, the focus on a specific subsection of the phosphoproteome may provide more biological insight than a global profiling. Usually cell culture models provide an accessible entry point before moving to more challenging sample types like tissues or more complex organisms. Furthermore, model organisms with a less complex proteome allow for higher coverage in global profiling experiments compared to cells from higher organisms. Several examples of how proteomic analyses of protein expression and PTM profiles have been applied Volume 3, January/February 2014 to investigate the molecular basis of developmental biology processes are presented in a recent review article by Alexey Veraksa in WIREs Developmental Biology.141 Modeling and Integration with Other Data Types While phosphoproteome measurements contain very valuable information about the studied biological systems independently of other experimental observations, the combined analysis of phosphoproteomic data with other cellular parameters, and the generation of models based on the experimental data can certainly yield novel insights. Figure 7 illustrates which kind of datasets may be combined with phosphoproteome profiles, and which categories of modeling approaches are available. The overall structure of Figure 7 demonstrates that various types of experimental data can be acquired and analyzed in combination with the phosphoproteome data. One example for this is the normalization of phosphopeptide to protein abundances described above. Furthermore, phosphoproteome (or combined systemslevel) datasets can be mapped onto and combined with sets of biological information stored in public databases such as genomes, interaction networks, signaling pathways, phosphorylation site repositories, or individual research papers. The resulting compound body of experimental evidence is then used to facilitate the generation of computational models to visualize and describe the biological system under investigation, or even generate models with predictive features. According to the review article by Terfve and Saez-Rodriguez,142 we divide the modeling approaches into descriptive and predictive ones. It is beyond the scope of this review to comprehensively discuss the types of different datasets which have been combined with phosphoproteome profiles, and the various modeling approaches. An example for the integration of different datasets is the combined computational analysis of the yeast phosphoproteome and protein–protein interactome. Mapping of the phosphoproteome onto the yeast interactome and related large-scale datasets revealed that in general, phosphorylated proteins have more interaction partners than nonphosphorylated proteins, implying that phosphorylation plays an important role in the regulation of protein–protein interactions.143 Data integration and modeling are of high importance to phosphoproteomics and systems biology, because they can reveal regulatory connections and subnetworks that would not be apparent from the analysis of isolated datasets alone. More information about the modeling options © 2013 Wiley Periodicals, Inc. 101 wires.wiley.com/devbio Advanced Review FIGURE 7 | Phosphoproteomics in the framework of systems biology. Phosphoproteome profiles can be either analyzed as stand-alone datasets, or combined with other experimental data (yellow ovals) to generate further biological knowledge. In addition, experimental datasets of different sources which are stored in public databases or retrieved from literature can be integrated with the acquired phosphoproteome profiles in various ways. The generation of descriptive or predictive models of processes involving phosphorylation events depend on high-quality experimental data describing different cellular parameters. for phosphoproteome data can be found in the recent review article mentioned above.142 Biochemical and Functional Validation of Phosphorylation Events The constantly growing body of accumulated phosphoproteomic datasets in publicly available databases creates a discrepancy between the many phosphorylation events that have been identified by large-scale experiments, but are otherwise completely uncharacterized, and the relatively small population of phosphorylation sites for which the actual mechanistic and/or biological relevance is known. Vice versa, a lot of well characterized phosphorylation events are invisible to current standard phosphoproteomic screens, either because of abundance issues, or because they are not located in a protein sequence context that produces adequately sized tryptic peptides for MS analysis. This gap between uncharacterized and characterized phosphorylation events will be challenging to close, simply because of the large amount of time that has to be invested in the mechanistic and functional validation of phosphorylation events by classical low-throughput biochemical and genetic experiments. There is a tradeoff between the investment of time and 102 manpower and the depth or ‘biological resolution’ of knowledge that can be acquired about a specific number of newly discovered phosphorylation events. These different levels of validation (biochemical, mechanistic, physiological) are discussed in this section and are summarized in Figure 8. The first level of validation is a biochemical one, in which the goal is to show that the phosphorylation observed by MS on the peptide level is indeed occurring on the protein in question. For this purpose, so-called PhosTag gels which facilitate the visualization of a phosphorylationinduced bandshift of a protein in immunoblotting are a useful tool, as e.g. performed in a study investigating rapamycin-sensitive phosphorylation events in yeast.144 In the case of a phosphorylation that occurs under a specific condition, a bandshift may be seen in this condition compared to a control (e.g., untreated) condition, and this should be sensitive to phosphatase treatment of the protein extracts. Constitutive phosphorylation events will display a bandshift relative to a phosphatase-treated control sample, or a mutated allele of the protein of interest carrying a non-phosphorylatable amino acid in place of the given phosphorylation site. A more detailed and much more laborious biochemical validation entails the identification of the protein kinase which is catalyzing the phosphorylation of interest. Similar © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics FIGURE 8 | Different levels of validation for newly discovered phosphorylation sites. We divide the various types of validation experiments into three categories of varying biological information content, namely characterization of the phosphorylation events itself, investigation of the mechanistic relevance of the modification for the protein on which it occurs, and assessment of in vivo relevance by genetic means and structure–function analyses. to the identification of novel substrates for a given kinase, this is a nontrivial task and can be approached by a combination of computational (identification of kinase consensus motifs surrounding the phosphorylated residue), biochemical (in vitro and in vivo enzymatic assays with candidate kinases) and genetic (using the phosphorylation event as a readout in kinase knockdown or inhibition experiments) procedures. A chemical genetic approach which may be used in this context utilizes engineered analog-sensitive protein kinases.145 Once a new phosphorylation event has been confirmed on the protein level, a central question is how the activity, complex composition, subcellular localization or other cellular parameters of the modified protein are altered by the phosphorylation. This second level of validation addresses the issue of the mechanistic relevance of a phosphorylation event on the protein on which it occurs. An example of this kind of validation is the seminal discovery that phosphorylation of FOXO transcription factors by AKT leads to binding to 14-3-3 proteins, sequestration in the cytoplasm and thereby inhibition of transcription factor activity.146 Even if the phosphorylation event is well characterized on a biochemical and mechanistic level, even if the upstream kinase has been identified, this evidence is not sufficient to judge the physiological relevance of Volume 3, January/February 2014 a phosphorylation for the protein function in vivo. This can be assessed by investigating the capability of phosphorylation site mutant proteins to exert the function of the wild-type protein, either by knock-in or genetic rescue strategies. An example of how these different levels of validation have been performed for a specific set of phosphorylation site on a protein is described in Box 2. For the sake of completeness, it should be noted that these rescue experiments are usually performed in vivo and convey a powerful statement about physiological relevance in an actual organism, but can also be done with less effort in cell culture. Depending on the experimental system, either the endogenous gene can be replaced with the phosphosite mutant by homologous recombination, or a rescue construct can be combined with a null allele or RNA interference (RNAi). In the case of RNAi, measures have to be taken to direct the knockdown only to the endogenous gene but not the transgene encoded by the rescue construct. This can be achieved by targeting the untranslated regions (UTRs) of the endogenous messenger RNA by RNAi to achieve the knockdown. The transgene usually contains exogenous UTRs and will therefore not be affected by the RNAi-mediated silencing. The introduction of silent mutations that render the rescue construct-encoded gene refractory to RNAi are © 2013 Wiley Periodicals, Inc. 103 wires.wiley.com/devbio Advanced Review only suitable for short interfering (siRNA)- or short hairpin RNA (shRNA)-mediated knockdown, and not the procedures involving long double-stranded RNA (dsRNAs) like in most Drosophila cell culture RNAi screens. These complementation-based assays for assessing phosphorylation site relevance are time consuming and difficult to extend to high-throughput experimental scales. As a compromise between effort and biological insight, newly identified proteins from phosphoproteomic screens can be screened with RNAi knockdown for phenotypes related to the biological context under investigation. For example, hits from the analysis of the insulin- or rapamycin-sensitive phosphoproteome could be screened for cell growth phenotypes with RNAi in cell culture or in vivo. The proteins which elicit a growth-related phenotype upon silencing would be strong candidates for novel effectors or pathway components, even if no direct evidence about the phosphorylation event is generated by this type of experiment. BOX 2 PHYSIOLOGICAL RELEVANCE OF PHOSPHORYLATION SITES—THE EXAMPLE OF TSC2 PHOSPHORYLATION BY AKT/PKB An example for such a well-characterized molecular event is the phosphorylation of TSC2 by AKT. It had been shown that AKT is directly phosphorylating the protein in vitro and in vivo at several specific sites, thereby inactivating the TSC1/TSC2 complex. In this way, AKT was proposed to exert an activating stimulus on TOR activity by relieving the inhibition by the upstream TSC protein complex. However, it was then demonstrated by genetic means involving non-phosphorylatable as well as phospho-mimicking TSC2 mutants that this phosphorylation was irrelevant for normal TSC2 protein function in the developing organism. Phosphosite mutant alleles of TSC2 were demonstrated to perfectly rescue the lethality of TSC2 mutants, demonstrating that the phosphorylation events are dispensable for TSC2 protein function under normal conditions.147 In a similar study, this finding was confirmed and it was additionally demonstrated that although AKT phosphorylates also TSC1 in Drosophila, this modification is dispensable for AKT-dependent growth regulation as well.148 While these experiments show that AKTdependent TSC phosphorylation is non-essential during organismal development under normal 104 conditions, they do not rule out the possibility that it may be functionally required for specific processes such as cellular transformation triggered by deregulated insulin signaling. In a similar genetic setup in S. cerevisiae, the impact of specific phosphorylation events on metabolic enzyme activity was investigated using metabolic fluxes as a phenotypic readout.149 In summary, these genetic approaches illustrate how the in vivo relevance of specific phosphorylation events can be assessed. Current State of Phosphoproteome Analysis in Different Organisms Table 2 summarizes the current state of experimental phosphoproteome coverage in a range of organisms belonging to the three animal kingdoms archea, bacteria, and eukarya. When looking at the numbers in the table, two issues should be kept in mind. First, it is not possible to make a general statement about data quality and false discovery rates regarding these reported numbers. The underlying data sources are very heterogenous in terms of experimental and statistical procedures, as well as the availability of raw data such as fragmentation spectra. Even when using the large databases, it is usually not possible for the user to really recapitulate the quality of the underlying data and the calculation of confidence scores, if available at all. Second, the displayed coverage values are mainly a measure of how much effort has been invested in the phosphoproteome analysis of a specific organism. Because basically all experimental proteome and phosphoproteome inventories must still be considered incomplete to varying extents, the coverage values are likely not very accurate, but rather rough estimates. Seminal large-scale MS-based phosphoproteomic studies that each identified thousands of phosphorylation sites in model organisms and human cells include phosphoproteome analyses in Caenorhabditis elegans,150 Drosophila cultured cells151 and embryos,152 a mouse liver cell line,153 liver extracts4 and mouse embryonic stem cells,154 and the human cancer cell lines HeLa and K562.155 CONCLUSION Mass spectrometry has emerged as the method of choice for the large-scale analysis of protein phosphorylation events. Although impressive progress has been made in terms of proteome coverage and refinement of analytical techniques regarding enrichment, fragmentation, and detection of phosphopeptides, as well © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics TABLE 2 Phosphoproteome Coverage in Different Organisms Phosphoproteome Identifications Phospho- Phospho- Phosphorylation Organism proteins peptides Data Annotated Protein Percent sites Source Coding Genes Phosphorylated Archaea Halobacterium salinarum 62 100 75 P 2,749 2.3 Haloferax volcanii 8 n/a 9 P 4,015 0.2 Bacteria Mycoplasma pneumoniae 63 n/a 16 P 707 8.9 Trypanosoma cruzi 753 1,494 2,572 P 19,615 (P) 3.8 Bacillus subtilis (strain 168) 78 102 76 P 4,176 1.9 Escherichia coli (K12 substr. MG1655) 79 104 81 P 4,146 1.9 Lactococcus lactis 63 99 73 P 2,321 2.7 Klebsiella pneumoniae 81 n/a 117 P 5,779 1.4 Pseudomonas aeruginosa 23 n/a 55 P 5,571 0.4 Pseudomonas putida 40 n/a 53 P 5,350 0.7 Streptomyces coelicolor 40 n/a 46 P 8,153 0.5 Mycobacterium tuberculosis 301 n/a 516 P 4,003 7.5 Streptococcus pneumoniae 84 n/a 163 P 2,148 3.9 4,078 n/a 13,899 DB (P3DB) 37,761 (AtGDB171) 10.8 325 n/a 818 P 1,01,620 0.3 Eukarya Arabidopsis thaliana Brassica napus Glycine max Medicago truncatula Nicotiana tabacum Oryza sativa 1,451 n/a 2,739 DB (P3DB) 62,442 (GMGDB163) 2.3 980 n/a 3,351 P 45,888 (Phytozome) 2.1 10 n/a 10 P 1,16,964 4,829 n/a 12,317 P 66,338 (Phytozome) > 0.1 7.3 Solanum tuberosum 2 n/a 3 P 39,031 Zea mays 86 n/a 115 P 1,36,522 (ZmGDB181) 0.1 1 > 0.1 Saccharomyces cerevisiae 3,006 24,190 n/a DB (PhosphoPep ) 6,692 (ENSEMBL) 44.9 Caenorhabditis elegans 2,373 6,926 6,780 DB (PHOSIDA) 20,517 (ENSEMBL) 11.6 1 Drosophila melanogaster 5,786 23,301 n/a DB (PhosphoPep ) 13,937 (ENSEMBL) 41.5 Mus musculus 9,234 24,604 25,085 DB (PHOSIDA) 23,158 (ENSEMBL) 39.9 Homo sapiens 8,283 23,130 24,262 DB (PHOSIDA) 20,848 (ENSEMBL) 39.7 Homo sapiens and Mus musculus 18,768 n/a 17,7945 DB (PhosphoSitePlus) 44,006 (ENSEMBL) 42.6 Experimental phosphoproteome coverage is shown in terms of identified phosphorylated proteins, phosphopeptides, and phosphorylation sites. The distinction between identified phosphopeptides and phosphorylation sites was only made where this was clearly stated in the data source. Data sources are either research publications (P) or databases (DB). For organisms with several phosphoproteomic studies, database results are provided. For organisms that are represented in several phosphorylation site databases (yeast, C. elegans, Drosophila, mouse, human), the datasets containing the highest number of phosphopeptides or phosphorylation sites are displayed. PhosphoSitePlus, which does not feature an organism-specific proteome-wide search function, contains the highest number of phosphorylation sites, 90% of which are from human and mouse. Total predicted proteome size is expressed as the number of annotated protein coding genes and was retrieved, unless indicated otherwise, from the NCBI Genome database. The numbers regarding phosphoproteome coverage for most of the prokaryotes are taken from a review article about bacterial phosphoproteomics by Mijakovic and Macek.8 1 PeptideProphet cutoff set at 0.9. as computational curation of the acquired datasets, current phosphoproteomic studies must still be considered incomplete because of the inability to capture the whole phosphoproteome. Several steps in the experimental pipeline can be manipulated and optimized Volume 3, January/February 2014 to enhance phosphoproteome coverage. A fruitful and goal-oriented strategy is the targeted analysis of a defined subphosphoproteome which contains a high degree of information with respect to the biological context under investigation. This can be © 2013 Wiley Periodicals, Inc. 105 wires.wiley.com/devbio Advanced Review achieved by various sample preparation protocols such as subcellular fractionation, isolation of subpopulations such as the phosphotyrosine proteome, or the physical enrichment of protein classes or complexes. Furthermore, targeted MS techniques like SRM enable the sensitive, specific, and quantitative detection of intermediate numbers of phosphorylated peptides and proteins in the samples of interest. As MS instrumentation becomes more powerful and enables ever-increasing coverage of the phosphoproteome by conventional DDA-type or DIA-type acquisition schemes, the need for targeted detection of phosphopeptides may decrease in the future. Probably the most challenging aspect of phosphoproteomics for decades to come will be the task of assigning biological significance to all the newly discovered phosphorylation sites, a challenge that will only be met by the concerted efforts of proteomics, classical biology and computational resources to integrate the experimental results from different disciplines. REFERENCES 1. Cohen P. The origins of protein phosphorylation. Nat Cell Biol 2002, 4:E127–E130. 13. Hunter T. Tyrosine phosphorylation: thirty years and counting. Curr Opin Cell Biol 2009, 21:140–146. 2. Hunter T, Sefton BM. Transforming gene product of Rous sarcoma virus phosphorylates tyrosine. Proc Natl Acad Sci USA 1980, 77:1311–1315. 14. Tan CS, Pasculescu A, Lim WA, Pawson T, Bader GD, Linding R. Positive selection of tyrosine loss in metazoan evolution. Science 2009, 325:1686–1688. 3. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127:635–648. 15. Goulian M. Two-component signaling circuit structure and properties. Curr Opin Microbiol 2010, 13:184–189. 4. Villen J, Beausoleil SA, Gerber SA, Gygi SP. Largescale phosphorylation analysis of mouse liver. Proc Natl Acad Sci USA 2007, 104:1488–1493. 5. Hohenester UM, Ludwig K, Krieglstein J, Konig S. Stepchild phosphohistidine: acid-labile phosphorylation becomes accessible by functional proteomics. Anal Bioanal Chem 2010, 397:3209–3212. 6. Cohen P. The regulation of protein function by multisite phosphorylation—a 25 year update. Trends Biochem Sci 2000, 25:596–601. 7. Holt LJ, Tuch BB, Villen J, Johnson AD, Gygi SP, Morgan DO. Global analysis of Cdk1 substrate phosphorylation sites provides insights into evolution. Science 2009, 325:1682–1686. 16. Lee JW, Chen H, Pullikotil P, Quon MJ. Protein kinase A-alpha directly phosphorylates FoxO1 in vascular endothelial cells to regulate expression of vascular cellular adhesion molecule-1 mRNA. J Biol Chem 2011, 286:6423–6432. 17. Bantscheff M, Eberhard D, Abraham Y, Bastuck S, Boesche M, Hobson S, Mathieson T, Perrin J, Raida M, Rau C, et al. Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors. Nat Biotechnol 2007, 25:1035–1044. 18. Dephoure N, Howson RW, Blethrow JD, Shokat KM, O’Shea EK. Combining chemical genetics and proteomics to identify protein kinase substrates. Proc Natl Acad Sci USA 2005, 102:17940–17945. 8. Mijakovic I, Macek B. Impact of phosphoproteomics on studies of bacterial physiology. FEMS Microbiol Rev 2012, 36:877–892. 19. Sopko R, Andrews BJ. Linking the kinome and phosphorylome—a comprehensive review of approaches to find kinase targets. Mol Biosyst 2008, 4:920–933. 9. Lemeer S, Heck AJ. The phosphoproteomics data explosion. Curr Opin Chem Biol 2009, 13:414–420. 20. Hunter T. The age of crosstalk: phosphorylation, ubiquitination, and beyond. Mol Cell 2007, 28:730–738. 10. Ubersax JA, Ferrell JE Jr. Mechanisms of specificity in protein phosphorylation. Nat Rev Mol Cell Biol 2007, 8:530–541. 21. Kamemura K, Hart GW. Dynamic interplay between O-glycosylation and O-phosphorylation of nucleocytoplasmic proteins: a new paradigm for metabolic control of signal transduction and transcription. Prog Nucleic Acid Res Mol Biol 2003, 73:107–136. 11. Levy ED, Michnick SW, Landry CR. Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philos Trans R Soc Lond B Biol Sci 2012, 367:2594–2606. 22. Hart GW, Housley MP, Slawson C. Cycling of Olinked beta-N-acetylglucosamine on nucleocytoplasmic proteins. Nature 2007, 446:1017–1022. 12. Pearlman SM, Serber Z, Ferrell JE Jr. A mechanism for the evolution of phosphorylation sites. Cell 2011, 147:934–946. 23. Mishra S, Ande SR, Salter NW. O-GlcNAc modification: why so intimately associated with phosphorylation? Cell Commun Signal 2011, 9:1. 106 © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics 24. Rust HL, Thompson PR. Kinase consensus sequences: a breeding ground for crosstalk. ACS Chem Biol 2011, 6:881–892. 25. Yamagata K, Daitoku H, Takahashi Y, Namiki K, Hisatake K, Kako K, Mukai H, Kasuya Y, Fukamizu A. Arginine methylation of FOXO transcription factors inhibits their phosphorylation by Akt. Mol Cell 2008, 32:221–231. 37. Peterson AC, Russell JD, Bailey DJ, Westphall MS, Coon JJ. Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol Cell Proteomics 2012, 11:1475–1488. 38. Trost M, Bridon G, Desjardins M, Thibault P. Subcellular phosphoproteomics. Mass Spectrom Rev 2010, 29:962–990. 26. Sims RJ 3rd, Rojas LA, Beck D, Bonasio R, Schuller R, Drury WJ 3rd, Eick D, Reinberg D. The C-terminal domain of RNA polymerase II is modified by sitespecific methylation. Science 2011, 332:99–103. 39. Bodenmiller B, Mueller LN, Mueller M, Domon B, Aebersold R. Reproducible isolation of distinct, overlapping segments of the phosphoproteome. Nat Methods 2007, 4:231–237. 27. Zhang H, Pelech S. Using protein microarrays to study phosphorylation-mediated signal transduction. Semin Cell Dev Biol 2012, 23:872–882. 40. Bodenmiller B, Mueller LN, Pedrioli PG, Pflieger D, Junger MA, Eng JK, Aebersold R, Tao WA. An integrated chemical, mass spectrometric and computational strategy for (quantitative) phosphoproteomics: application to Drosophila melanogaster Kc167 cells. Mol Biosyst 2007, 3:275–286. 28. Perez OD, Nolan GP. Simultaneous measurement of multiple active kinase states using polychromatic flow cytometry. Nat Biotechnol 2002, 20:155–162. 29. Perl AE, Kasner MT, Shank D, Luger SM, Carroll M. Single-cell pharmacodynamic monitoring of S6 ribosomal protein phosphorylation in AML blasts during a clinical trial combining the mTOR inhibitor sirolimus and intensive chemotherapy. Clin Cancer Res 2012, 18:1716–1725. 30. Ornatsky O, Bandura D, Baranov V, Nitz M, Winnik MA, Tanner S. Highly multiparametric analysis by mass cytometry. J Immunol Methods 2010, 361:1–20. 31. Bendall SC, Simonds EF, Qiu P, Amir el AD, Krutzik PO, Finck R, Bruggner RV, Melamed R, Trejo A, Ornatsky OI, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 2011, 332:687–696. 32. Steen H, Mann M. A new derivatization strategy for the analysis of phosphopeptides by precursor ion scanning in positive ion mode. J Am Soc Mass Spectrom 2002, 13:996–1003. 33. Geiger T, Cox J, Mann M. Proteomics on an Orbitrap benchtop mass spectrometer using allion fragmentation. Mol Cell Proteomics 2010, 9:2252–2261. 34. Gillet LC, Navarro P, Tate S, Rost H, Selevsek N, Reiter L, Bonner R, Aebersold R. Targeted data extraction of the MS/MS spectra generated by dataindependent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 2012, 11:O111 016717. 35. Maiolica A, Junger MA, Ezkurdia I, Aebersold R. Targeted proteome investigation via selected reaction monitoring mass spectrometry. J Proteomics 2012, 75:3495–3513. 36. Gallien S, Duriez E, Crone C, Kellmann M, Moehring T, Domon B. Targeted proteomic quantification on quadrupole-orbitrap mass spectrometer. Mol Cell Proteomics 2012, 11:1709–1723. Volume 3, January/February 2014 41. Glatter T, Schittenhelm RB, Rinner O, Roguska K, Wepf A, Junger MA, Kohler K, Jevtov I, Choi H, Schmidt A, et al. Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome. Mol Syst Biol 2011, 7:547. 42. Pflieger D, Junger MA, Muller M, Rinner O, Lee H, Gehrig PM, Gstaiger M, Aebersold R. Quantitative proteomic analysis of protein complexes: concurrent identification of interactors and their state of phosphorylation. Mol Cell Proteomics 2008, 7:326–346. 43. Kruger M, Kratchmarova I, Blagoev B, Tseng YH, Kahn CR, Mann M. Dissection of the insulin signaling pathway via quantitative phosphoproteomics. Proc Natl Acad Sci USA 2008, 105:2451–2456. 44. Boersema PJ, Foong LY, Ding VM, Lemeer S, van Breukelen B, Philp R, Boekhorst J, Snel B, den Hertog J, Choo AB, et al. Indepth qualitative and quantitative profiling of tyrosine phosphorylation using a combination of phosphopeptide immunoaffinity purification and stable isotope dimethyl labeling. Mol Cell Proteomics 2010, 9:84–99. 45. Anderson NL, Anderson NG. The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 2002, 1:845–867. 46. Picotti P, Bodenmiller B, Mueller LN, Domon B, Aebersold R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 2009, 138:795–806. 47. Walsh CT. Posttranslational modification of proteins: expanding nature’s inventory. Greenwood Village, CO: Roberts and Co. Publishers; 2006. 48. Chen EI, Cociorva D, Norris JL, Yates JR 3rd.. Optimization of mass spectrometry-compatible surfactants for shotgun proteomics. J Proteome Res 2007, 6:2529–2538. © 2013 Wiley Periodicals, Inc. 107 wires.wiley.com/devbio Advanced Review 49. MacCoss MJ, McDonald WH, Saraf A, Sadygov R, Clark JM, Tasto JJ, Gould KL, Wolters D, Washburn M, Weiss A, et al. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc Natl Acad Sci USA 2002, 99:7900–7905. 50. Beausoleil SA, Jedrychowski M, Schwartz D, Elias JE, Villen J, Li J, Cohn MA, Cantley LC, Gygi SP. Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc Natl Acad Sci USA 2004, 101:12130–12135. 51. Han G, Ye M, Zhou H, Jiang X, Feng S, Jiang X, Tian R, Wan D, Zou H, Gu J. Large-scale phosphoproteome analysis of human liver tissue by enrichment and fractionation of phosphopeptides with strong anion exchange chromatography. Proteomics 2008, 8:1346–1361. 52. Singer D, Kuhlmann J, Muschket M, Hoffmann R. Separation of multiphosphorylated peptide isomers by hydrophilic interaction chromatography on an aminopropyl phase. Anal Chem 2010, 82:6409–6414. 53. Hao P, Guo T, Sze SK. Simultaneous analysis of proteome, phospho- and glycoproteome of rat kidney tissue with electrostatic repulsion hydrophilic interaction chromatography. PLoS One 2011, 6:e16884. 54. Nilsson CL. Advances in quantitative phosphoproteomics. Anal Chem 2012, 84:735–746. 55. Ye J, Zhang X, Young C, Zhao X, Hao Q, Cheng L, Jensen ON. Optimized IMAC-IMAC protocol for phosphopeptide recovery from complex biological samples. J Proteome Res 2010, 9:3561–3573. 56. Thingholm TE, Jensen ON, Robinson PJ, Larsen MR. SIMAC (sequential elution from IMAC), a phosphoproteomics strategy for the rapid separation of monophosphorylated from multiply phosphorylated peptides. Mol Cell Proteomics 2008, 7:661–671. in large-scale phosphoproteomics experiments. J Proteome Res 2012, 11:3753–3765. 63. Shvartsburg AA, Singer D, Smith RD, Hoffmann R. Ion mobility separation of isomeric phosphopeptides from a protein with variant modification of adjacent residues. Anal Chem 2011, 83:5078–5085. 64. Bridon G, Bonneil E, Muratore-Schroeder T, Caron-Lizotte O, Thibault P. Improvement of phosphoproteome analyses using FAIMS and decision tree fragmentation. application to the insulin signaling pathway in Drosophila melanogaster S2 cells. J Proteome Res 2012, 11:927–940. 65. Wu R, Dephoure N, Haas W, Huttlin EL, Zhai B, Sowa ME, Gygi SP. Correct interpretation of comprehensive phosphorylation dynamics requires normalization by protein expression changes. Mol Cell Proteomics 2011, 10:M111 009654. 66. Haas W, Faherty BK, Gerber SA, Elias JE, Beausoleil SA, Bakalarski CE, Li X, Villen J, Gygi SP. Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol Cell Proteomics 2006, 5:1326–1337. 67. Olsen JV, Mann M. Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc Natl Acad Sci USA 2004, 101:13417–13422. 68. Schroeder MJ, Shabanowitz J, Schwartz JC, Hunt DF, Coon JJ. A neutral loss activation method for improved phosphopeptide sequence analysis by quadrupole ion trap mass spectrometry. Anal Chem 2004, 76:3590–3598. 69. Steen H, Mann M. The ABC’s (and XYZ’s) of peptide sequencing. Nat Rev Mol Cell Biol 2004, 5:699–711. 70. Nagaraj N, D’Souza RC, Cox J, Olsen JV, Mann M. Feasibility of large-scale phosphoproteomics with higher energy collisional dissociation fragmentation. J Proteome Res 2010, 9:6786–6794. 57. Mamone G, Picariello G, Ferranti P, Addeo F. Hydroxyapatite affinity chromatography for the highly selective enrichment of mono- and multi-phosphorylated peptides in phosphoproteome analysis. Proteomics 2010, 10:380–393. 71. Zhang Y, Ficarro SB, Li S, Marto JA. Optimized Orbitrap HCD for quantitative analysis of phosphopeptides. J Am Soc Mass Spectrom 2009, 20:1425–1434. 58. Leitner A, Lindner W. Chemical tagging strategies for mass spectrometry-based phospho-proteomics. Methods Mol Biol 2009, 527:229–243. 72. Mann M, Kelleher NL. Precision proteomics: the case for high resolution and high mass accuracy. Proc Natl Acad Sci USA 2008, 105:18132–18138. 59. McLachlin DT, Chait BT. Improved beta-eliminationbased affinity purification strategy for enrichment of phosphopeptides. Anal Chem 2003, 75:6826–6836. 73. Przybylski C, Junger MA, Aubertin J, Radvanyi F, Aebersold R, Pflieger D. Quantitative analysis of protein complex constituents and their phosphorylation states on a LTQ-Orbitrap instrument. J Proteome Res 2010, 9:5118–5132. 60. Warthaka M, Karwowska-Desaulniers P, Pflum MK. Phosphopeptide modification and enrichment by oxidation-reduction condensation. ACS Chem Biol 2006, 1:697–701. 61. Lansdell TA, Tepe JJ. Isolation of phosphopeptides using solid phase enrichment. Tetrahedron Lett 2004, 45:91–93. 74. Vandenbogaert M, Hourdel V, Jardin-Mathe O, Bigeard J, Bonhomme L, Legros V, Hirt H, Schwikowski B, Pflieger D. Automated Phosphopeptide Identification Using Multiple MS/MS Fragmentation Modes. J Proteome Res 2012, 11:5695–5703. 62. Courcelles M, Bridon G, Lemieux S, Thibault P. Occurrence and detection of phosphopeptide isomers 75. Kelstrup CD, Hekmat O, Francavilla C, Olsen JV. Pinpointing phosphorylation sites: Quantitative 108 © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics filtering and a novel site-specific x-ion fragment. J Proteome Res 2011, 10:2937–2948. reaction monitoring mass spectrometry. J Proteome Res 2010, 9:2752–2761. 76. Palumbo AM, Smith SA, Kalcic CL, Dantus M, Stemmer PM, Reid GE. Tandem mass spectrometry strategies for phosphoproteome analysis. Mass Spectrom Rev 2011, 30:600–625. 88. Wu R, Haas W, Dephoure N, Huttlin EL, Zhai B, Sowa ME, Gygi SP. A large-scale method to measure absolute protein phosphorylation stoichiometries. Nat Methods 2011, 8:677–683. 77. Swaney DL, McAlister GC, Coon JJ. Decision tree-driven tandem mass spectrometry for shotgun proteomics. Nat Methods 2008, 5:959–964. 89. Cox DM, Zhong F, Du M, Duchoslav E, Sakuma T, McDermott JC. Multiple reaction monitoring as a method for identifying protein posttranslational modifications. J Biomol Tech 2005, 16:83–90. 78. Bantscheff M, Lemeer S, Savitski MM, Kuster B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal Bioanal Chem 2012, 404:939–965. 79. Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 2002, 1:376–386. 80. Schmidt A, Kellermann J, Lottspeich F. A novel strategy for quantitative proteomics using isotopecoded protein labels. Proteomics 2005, 5:4–15. 81. DeSouza LV, Taylor AM, Li W, Minkoff MS, Romaschin AD, Colgan TJ, Siu KW. Multiple reaction monitoring of mTRAQ-labeled peptides enables absolute quantification of endogenous levels of a potential cancer marker in cancerous and normal endometrial tissues. J Proteome Res 2008, 7:3525–3534. 82. Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 1999, 17:994–999. 83. Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, et al. Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 2004, 3:1154–1169. 84. Thompson A, Schafer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Johnstone R, Mohammed AK, Hamon C. Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem 2003, 75:1895–1904. 85. Mertins P, Udeshi ND, Clauser KR, Mani DR, Patel J, Ong SE, Jaffe JD, Carr SA. iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics. Mol Cell Proteomics 2012, 11:M111 014423. 86. Balasubramaniam D, Eissler CL, Stauffacher CV, Hall MC. Use of selected reaction monitoring data for label-free quantification of protein modification stoichiometry. Proteomics 2010, 10:4301–4305. 87. Jin LL, Tong J, Prakash A, Peterman SM, St-Germain JR, Taylor P, Trudel S, Moran MF. Measurement of protein phosphorylation stoichiometry by selected Volume 3, January/February 2014 90. Unwin RD, Griffiths JR, Leverentz MK, Grallert A, Hagan IM, Whetton AD. Multiple reaction monitoring to identify sites of protein phosphorylation with high sensitivity. Mol Cell Proteomics 2005, 4:1134–1144. 91. Zappacosta F, Collingwood TS, Huddleston MJ, Annan RS. A quantitative results-driven approach to analyzing multisite protein phosphorylation: the phosphate-dependent phosphorylation profile of the transcription factor Pho4. Mol Cell Proteomics 2006, 5:2019–2030. 92. Glinski M, Weckwerth W. Differential multisite phosphorylation of the trehalose-6-phosphate synthase gene family in Arabidopsis thaliana: a mass spectrometry-based process for multiparallel peptide library phosphorylation analysis. Mol Cell Proteomics 2005, 4:1614–1625. 93. Narumi R, Murakami T, Kuga T, Adachi J, Shiromizu T, Muraoka S, Kume H, Kodera Y, Matsumoto M, Nakayama K, et al. A strategy for large-scale phosphoproteomics and SRM-based validation of human breast cancer tissue samples. J Proteome Res 2012, 11:5311–5322. 94. Gamez-Pozo A, Sanchez-Navarro I, Calvo E, Diaz E, Miguel-Martin M, Lopez R, Agullo T, Camafeita E, Espinosa E, Lopez JA, et al. Protein phosphorylation analysis in archival clinical cancer samples by shotgun and targeted proteomics approaches. Mol Biosyst 2011, 7:2368–2374. 95. Wolf-Yadlin A, Hautaniemi S, Lauffenburger DA, White FM. Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks. Proc Natl Acad Sci USA 2007, 104:5860–5865. 96. Sherrod SD, Myers MV, Li M, Myers JS, Carpenter KL, Maclean B, Maccoss MJ, Liebler DC, Ham AJ. Label-free quantitation of protein modifications by pseudo selected reaction monitoring with internal reference peptides. J Proteome Res 2012, 11:3467–3479. 97. Beausoleil SA, Villen J, Gerber SA, Rush J, Gygi SP. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol 2006, 24:1285–1292. 98. Bailey CM, Sweet SM, Cunningham DL, Zeller M, Heath JK, Cooper HJ. SLoMo: automated site localization of modifications from ETD/ECD mass spectra. J Proteome Res 2009, 8:1965–1971. © 2013 Wiley Periodicals, Inc. 109 wires.wiley.com/devbio Advanced Review 99. Shteynberg D, Deutsch E, Mendoza L, Slagel J, Lam HH, Nesvizhskii A, Moritz R. PTMProphet: TPP Software for Validation of Modified Site Locations on Post-Translationally Modified Peptides. 60th ASMS (American Society for Mass Spectrometry) Conference, 2012. 100. Savitski MM, Lemeer S, Boesche M, Lang M, Mathieson T, Bantscheff M, Kuster B. Confident phosphorylation site localization using the Mascot Delta Score. Mol Cell Proteomics 2011, 10:M110 003830. 101. Ruttenberg BE, Pisitkun T, Knepper MA, Hoffert JD. PhosphoScore: an open-source phosphorylation site assignment tool for MSn data. J Proteome Res 2008, 7:3054–3059. 102. Payne SH, Yau M, Smolka MB, Tanner S, Zhou H, Bafna V. Phosphorylation-specific MS/MS scoring for rapid and accurate phosphoproteome analysis. J Proteome Res 2008, 7:3373–3381. 103. Schlosser A, Vanselow JT, Kramer A. Comprehensive phosphorylation site analysis of individual phosphoproteins applying scoring schemes for MS/MS data. Anal Chem 2007, 79:7439–7449. 104. Bodenmiller B, Aebersold R. Phosphoproteome resource for systems biology research. Methods Mol Biol 2011, 694:307–322. 105. Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M. PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 2007, 8:R250. 106. Dinkel H, Chica C, Via A, Gould CM, Jensen LJ, Gibson TJ, Diella F. Phospho.ELM: a database of phosphorylation sites—update 2011. Nucleic Acids Res 2011, 39:D261–D267. 107. Yang CY, Chang CH, Yu YL, Lin TC, Lee SA, Yen CC, Yang JM, Lai JM, Hong YR, Tseng TL, et al. PhosphoPOINT: a comprehensive human kinase interactome and phospho-protein database. Bioinformatics 2008, 24:i14–i20. 108. Durek P, Schmidt R, Heazlewood JL, Jones A, MacLean D, Nagel A, Kersten B, Schulze WX. PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res 2010, 38:D828–834. Protein Reference Database—2009 update. Nucleic Acids Res 2009, 37:D767–D772. 112. Hornbeck PV, Kornhauser JM, Tkachev S, Zhang B, Skrzypek E, Murray B, Latham V, Sullivan M. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res 2012, 40:D261–D270. 113. Schwartz D, Gygi SP. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat Biotechnol 2005, 23:1391–1398. 114. Schwartz D, Chou MF, Church GM. Predicting protein post-translational modifications using meta-analysis of proteome scale data sets. Mol Cell Proteomics 2009, 8:365–379. 115. Amanchy R, Periaswamy B, Mathivanan S, Reddy R, Tattikota SG, Pandey A. A curated compendium of phosphorylation motifs. Nat Biotechnol 2007, 25:285–286. 116. Ritz A, Shakhnarovich G, Salomon AR, Raphael BJ. Discovery of phosphorylation motif mixtures in phosphoproteomics data. Bioinformatics 2009, 25:14–21. 117. Huang H, Li L, Wu C, Schibli D, Colwill K, Ma S, Li C, Roy P, Ho K, Songyang Z, et al. Defining the specificity space of the human SRC homology 2 domain. Mol Cell Proteomics 2008, 7:768–784. 118. Li L, Wu C, Huang H, Zhang K, Gan J, Li SS. Prediction of phosphotyrosine signaling networks using a scoring matrix-assisted ligand identification approach. Nucleic Acids Res 2008, 36:3263–3273. 119. Miller ML, Jensen LJ, Diella F, Jorgensen C, Tinti M, Li L, Hsiung M, Parker SA, Bordeaux J, SicheritzPonten T, et al. Linear motif atlas for phosphorylationdependent signaling. Sci Signal 2008, 1:ra2. 120. Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, Jorgensen C, Miron IM, Diella F, Colwill K, Taylor L, Elder K, et al. Systematic discovery of in vivo phosphorylation networks. Cell 2007, 129:1415–1426. 121. Blom N, Sicheritz-Ponten T, Gupta R, Gammeltoft S, Brunak S. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 2004, 4:1633–1649. 109. Gao J, Agrawal GK, Thelen JJ, Xu D. P3DB: a plant protein phosphorylation database. Nucleic Acids Res 2009, 37:D960–D962. 122. Ellis JJ, Kobe B. Predicting protein kinase specificity: Predikin update and performance in the DREAM4 challenge. PLoS One 2011, 6:e21169. 110. Li H, Xing X, Ding G, Li Q, Wang C, Xie L, Zeng R, Li Y. SysPTM: a systematic resource for proteomic research on post-translational modifications. Mol Cell Proteomics 2009, 8:1839–1849. 123. Obenauer JC, Cantley LC, Yaffe MB. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 2003, 31:3635–3641. 111. Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, et al. Human 124. Xue Y, Ren J, Gao X, Jin C, Wen L, Yao X. GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 2008, 7:1598–1608. 110 © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014 WIREs Developmental Biology Mass spectrometry-driven phosphoproteomics 125. Mi T, Merlin JC, Deverasetty S, Gryk MR, Bill TJ, Brooks AW, Lee LY, Rathnayake V, Ross CA, Sargeant DP, et al. Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences. Nucleic Acids Res 2012, 40:D252–D260. 126. Lachmann A, Ma’ayan A. KEA: kinase enrichment analysis. Bioinformatics 2009, 25:684–686. 127. Sharifpoor S, Nguyen Ba AN, Youn JY, van Dyk D, Friesen H, Douglas AC, Kurat CF, Chong YT, Founk K, Moses AM, et al. A quantitative literature-curated gold standard for kinase-substrate pairs. Genome Biol 2011:12:R39. 128. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011, 27:431–432. 129. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011, 39:D561–568. 130. Stark C, Breitkreutz BJ, Chatr-Aryamontri A, Boucher L, Oughtred R, Livstone MS, Nixon J, Van Auken K, Wang X, Shi X, et al. The BioGRID interaction database: 2011 update. Nucleic Acids Res 2011, 39:D698–704. 131. Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, Sacco F, Palma A, Nardozza AP, Santonico E, et al. MINT, the molecular interaction database: 2012 update. Nucleic Acids Res 2012, 40:D857–861. 132. Kerrien S, Aranda B, Breuza L, Bridge A, BroackesCarter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res 2012, 40:D841–846. 133. Thomas PD, Kejariwal A, Campbell MJ, Mi H, Diemer K, Guo N, Ladunga I, Ulitsky-Lazareva B, Muruganujan A, Rabkin S, et al. PANTHER: a browsable database of gene products organized by biological function, using curated protein family and subfamily classification. Nucleic Acids Res 2003, 31:334–341. 134. Ren J, Jiang C, Gao X, Liu Z, Yuan Z, Jin C, Wen L, Zhang Z, Xue Y, Yao X. PhosSNP for systematic analysis of genetic polymorphisms that influence protein phosphorylation. Mol Cell Proteomics 2010, 9:623–634. 135. Gong W, Zhou D, Ren Y, Wang Y, Zuo Z, Shen Y, Xiao F, Zhu Q, Hong A, Zhou X, et al. PepCyber:PPEP: a database of human protein protein interactions mediated by phosphoprotein-binding domains. Nucleic Acids Res 2008, 36:D679–D683. 136. Bertsch A, Gropl C, Reinert K, Kohlbacher O. OpenMS and TOPP: open source software for LC-MS data analysis. Methods Mol Biol 2011, 696:353–367. Volume 3, January/February 2014 137. MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B, Kern R, Tabb DL, Liebler DC, MacCoss MJ. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26:966–968. 138. Ren J, Gao X, Liu Z, Cao J, Ma Q, Xue Y. Computational analysis of phosphoproteomics: progresses and perspectives. Curr Protein Pept Sci 2011, 12:591–601. 139. Eng JK, Searle BC, Clauser KR, Tabb DL. A face in the crowd: recognizing peptides through database search. Mol Cell Proteomics 2011, 10:R111 009522. 140. Matic I, Ahel I, Hay RT. Reanalysis of phosphoproteomics data uncovers ADP-ribosylation sites. Nat Methods 2012, 9:771–772. 141. Veraksa A. Regulation of developmental processes: insights from mass spectrometry—based proteomics. WIREs Dev Biol 2012. doi:10.1002/WDEV.102. 142. Terfve C, Saez-Rodriguez J. Modeling signaling networks using high-throughput phospho-proteomics. Adv Exp Med Biol 2012, 736:19–57. 143. Yachie N, Saito R, Sugiyama N, Tomita M, Ishihama Y. Integrative features of the yeast phosphoproteome and protein-protein interaction map. PLoS Comput Biol 2011, 7:e1001064. 144. Huber A, Bodenmiller B, Uotila A, Stahl M, Wanka S, Gerrits B, Aebersold R, Loewith R. Characterization of the rapamycin-sensitive phosphoproteome reveals that Sch9 is a central coordinator of protein synthesis. Genes Dev 2009, 23:1929–1943. 145. Bishop AC, Buzko O, Shokat KM. Magic bullets for protein kinases. Trends Cell Biol 2001, 11:167–172. 146. Brunet A, Bonni A, Zigmond MJ, Lin MZ, Juo P, Hu LS, Anderson MJ, Arden KC, Blenis J, Greenberg ME. Akt promotes cell survival by phosphorylating and inhibiting a Forkhead transcription factor. Cell 1999, 96:857–868. 147. Dong J, Pan D. Tsc2 is not a critical target of Akt during normal Drosophila development. Genes Dev 2004, 18:2479–2484. 148. Schleich S, Teleman AA. Akt phosphorylates both Tsc1 and Tsc2 in Drosophila, but neither phosphorylation is required for normal animal growth. PLoS One 2009, 4:e6305. 149. Oliveira AP, Ludwig C, Picotti P, Kogadeeva M, Aebersold R, Sauer U. Regulation of yeast central metabolism by enzyme phosphorylation. Mol Syst Biol 2012, 8:623. 150. Zielinska DF, Gnad F, Jedrusik-Bode M, Wisniewski JR, Mann M. Caenorhabditis elegans has a phosphoproteome atypical for metazoans that is enriched in developmental and sex determination proteins. J Proteome Res 2009, 8:4039–4049. 151. Bodenmiller B, Malmstrom J, Gerrits B, Campbell D, Lam H, Schmidt A, Rinner O, Mueller LN, © 2013 Wiley Periodicals, Inc. 111 wires.wiley.com/devbio Advanced Review Shannon PT, Pedrioli PG, et al. PhosphoPep—a phosphoproteome resource for systems biology research in Drosophila Kc167 cells. Mol Syst Biol 2007, 3:139. 152. Zhai B, Villen J, Beausoleil SA, Mintseris J, Gygi SP. Phosphoproteome analysis of Drosophila melanogaster embryos. J Proteome Res 2008, 7:1675–1682. 153. Pan C, Gnad F, Olsen JV, Mann M. Quantitative phosphoproteome analysis of a mouse liver cell line reveals specificity of phosphatase inhibitors. Proteomics 2008, 8:4534–4546. 112 154. Li QR, Xing XB, Chen TT, Li RX, Dai J, Sheng QH, Xin SM, Zhu LL, Jin Y, Pei G, et al. Large scale phosphoproteome profiles comprehensive features of mouse embryonic stem cells. Mol Cell Proteomics 2011, 10:M110 001750. 155. Zhou H, Di Palma S, Preisinger C, Peng M, Polat AN, Heck AJ, Mohammed S. Toward a comprehensive characterization of a human cancer cell phosphoproteome. J Proteome Res 2013, 12:260–271. © 2013 Wiley Periodicals, Inc. Volume 3, January/February 2014