Subido por leviathangrave

protein engineering a bioinformatic approach

Anuncio
366
Putting engineering back into protein engineering:
bioinformatic approaches to catalyst design
Claes Gustafsson, Sridhar Govindarajan and Jeremy Minshull
Complex multivariate engineering problems are commonplace
and not unique to protein engineering. Mathematical and datamining tools developed in other fields of engineering have now
been applied to analyze sequence–activity relationships of
peptides and proteins and to assist in the design of proteins
and peptides with specified properties. Decreasing costs of
DNA sequencing in conjunction with methods to quickly
synthesize statistically representative sets of proteins allow
modern heuristic statistics to be applied to protein
engineering. This provides an alternative approach to
expensive assays or unreliable high-throughput surrogate
screens.
Addresses
DNA 2.0, Inc., 1455 Adams Drive, Menlo Park, CA 94025, USA
e-mail: cgustafsson@dnatwopointo.com
Current Opinion in Biotechnology 2003, 14:366–370
This review comes from a themed issue on
Protein technologies and commercial enzymes
Edited by Gjalt Huisman and Stephen Sligar
0958-1669/$ – see front matter
ß 2003 Elsevier Ltd. All rights reserved.
DOI 10.1016/S0958-1669(03)00101-0
Abbreviations
NK1 neurokinin 1
NP
non-polynomial
PLS partial least squares
Introduction
Protein engineering has classically been approached from
two diametrically opposed directions: rational design and
directed evolution. Rational design, in the tradition of
Descartes and Leibniz, attempts to understand protein
structure and function at a complete mechanistic level so
that any desired change can be effected by calculation
from first principles. Directed evolution, in the tradition
of John Locke and other empiricists, attempts to find
a desired solution by testing many different variants,
typically using various evolutionary based algorithms.
Both rational design and directed evolution in their
many alternative formats have shortcomings and advantages that have been discussed and compared elsewhere [1–3].
Modern heuristics applied to protein engineering is a
synthesis of empirical data and a rational analysis of that
information. The very first paper describing chemical
Current Opinion in Biotechnology 2003, 14:366–370
synthesis of a gene proposed that systematic variation
of amino acids would enable an understanding of the
relationships between the sequence of a protein and its
structure, physical behavior and activity [4]. Soon after
that, Svante Wold’s group developed and applied multivariate data analysis techniques to peptide design and
suggested that ‘the rapid development of protein engineering may then make it possible to produce designed
sets of mature proteins and enzymes for QSAR studies’
[5,6]. This review will summarize recent publications in
which modern heuristics have been applied to protein
engineering and describes technological advances that are
enabling Wold’s vision.
Protein optimization from an engineering
perspective
When faced with solving a difficult problem it can be
enlightening to see if a similar type of problem has been
solved before. Many disciplines and industries face the
same challenges of high system complexity and abundant variables that confront protein engineering [7]. In
some industries increasing complexity is intentional, as
in the addition of new control parameters for a car’s
combustion engine. Sometimes it is inherent to the
system itself, for example, in clinical drug trials. The
common challenge in car manufacturing, clinical trials
and protein engineering is to account for as much of this
complexity as possible when describing the relationship
between input variables (e.g. piston angle and temperature for car engines, age and medical history for patients
or amino acid residues available at each position for protein engineering [8]) and output variables (e.g. exhaust
levels and fuel efficiency for cars, side effects and survival rate for patients or the desired commercial properties such as catalytic activity, thermostability, substrate
specificity and immunogenicity for protein engineering).
Measured output variables may in turn result from combinations of properties that are not explicitly measured;
for protein engineering, these may include expression
levels and protein solubility [9]. Like small-molecule
quantitative structure–activity relationships (QSAR),
which have enjoyed much success in pharmaceutical
development, heuristic protein engineering aims to
identify the relationship between input and output variables to create biological macromolecules with defined
properties. For reasons described below, more work
has been published optimizing peptides than proteins
using engineering concepts. We therefore use peptide
examples to describe some of the principles before
describing how the same engineering tools are used to
optimize proteins.
www.current-opinion.com
Bioinformatic approaches to catalyst design Gustafsson, Govindarajan and Minshull 367
Navigating in protein sequence space
Multivariate design of improved polypeptides
Protein engineering can be divided into two subtasks:
defining the solution space and defining the search
algorithm.
Figure 1 shows a procedure for peptide optimization
derived from the one used by Norinder et al. [32] to
design analogs of the neuropeptide substance P with
increased affinity for the neurokinin 1 (NK1) receptor.
These authors used partial least squares (PLS) regression
[33,34] to correlate the sequences of 36 substance P
analogs with their activities. They used this model to
identify the positions and amino acid properties in substance P that had the largest effects on NK1 binding. The
authors designed, synthesized and tested six new peptides that the model predicted to be improved NK1
binders. All six were shown to be highly active. Their
sequence–activity data was added to the first 36 peptides
to build a second generation PLS model, which was used
to design a further three variants. One of these had an
IC50 of 5 pM, 300-fold better than the wild-type peptide
and 45-fold better than the best of the original 36 variants
[32]. It is striking that extremely small numbers of variants (45) were made and tested to achieve very significant improvements in the desired function.
Define the solution space
The total possible number of proteins encoded by a 1 kb
gene is 20333 (20 alternative amino acids at each position
in a string of 333 residues) 10430. This is an unfeasibly
large number of variants to screen. Fortunately, not all
possible sequences need be considered as naturally occurring proteins can usually be relied on to provide a starting
point for engineering efforts. Active point-mutants [10],
phylogenetic substitutions [11], structural modeling
[12,13] and known immunogenic constraints [14] are
well-explored methods of targeting specific regions of a
protein for change.
Define the search algorithm
Protein engineering is a non-polynomial (NP)-complete
problem [15,16], meaning that the problem scales nonpolynomially with increasing complexity and no known
algorithm can guarantee determining the optimal solution
without evaluating all possible solutions. Empirical protein engineers have largely limited themselves to address the NP-complete problem with exhaustive searches
using ultra-high-throughput phage and ribosome display
screens [17,18] or evolutionary methods [1–3,19]. By
contrast, the wider engineering community has exploited
genetic algorithms as well as regression-based algorithms,
neural nets, clustering, and several other tools as alternative techniques to address NP-complete problems [20].
Statistical targeting of amino acid changes
Comparisons of natural protein and DNA sequences,
particularly those using the powerful technique of principal component analysis, can be used to identify residues
that are important for specific functionality within a
protein [21,22,23,24,25]. Natural substitution patterns
can also be used to infer which changes are likely to be
acceptable within functional proteins. For example, a
recent study of subtilisin variants found that all 52 of
the amino acid variations found in 15 homologs were
active within the context of at least one backbone; their
incorporation produced proteases with varying catalytic
properties [26]. In another set of experiments, all of the
active-site residues from one fungal phytase were
replaced with those from another, again the result was
an active protein with altered catalytic properties [11].
By incorporating small numbers of changes identified
from alignments of naturally occurring sequences, it
has also been possible to increase the thermostability
of a fungal phytase by over 308C [27]. Substitution
matrices derived from synonymous and non-synonymous
substitution rates can also be used to choose reasonable
amino acid changes if there is insufficient phylogenetic
data to use sequence alignments [28–30,31].
www.current-opinion.com
The same techniques have also been applied to proteins.
In one particularly informative example, Bucht and colleagues optimized a complex protein phenotype: the
activity of acetylcholinesterase expressed on the surface
of human COS-1 cells. Display of acetylcholinesterase on
the cell surface occurs as a result of glycosyl phosphatidylinositol modification at the C terminus of the protein.
The authors identified two amino acids in the signal
peptide region of the protein, the identity of which
affected cell-surface localization of the protein. They
synthesized eight variant genes, tested the surface expression of the eight encoded proteins and used PLS to
Figure 1
Create initial set of variants
and measure desired phenotype
Build sequence–activity model
Design new variants based on model
predictions for high performing sequences
Add new data to refine
sequence–activity model
Synthesize and test new variants
Current Opinion in Biotechnology
Polypeptide optimization using mathematical models. The process is
that used by Norinder et al. [32] for the optimization of the neuropeptide
substance P.
Current Opinion in Biotechnology 2003, 14:366–370
368 Protein technologies and commercial enzymes
model the sequence–activity relationship. The authors
then constructed an additional 27 variants in this same
region of the protein, using them to test and refine the
model, thereby identifying the optimal sequence for cellsurface expression of acetylcholinesterase [35]. Modeling sequence–activity relationships to identify optimal
protein variants has not been limited to amino acids
localized to a small region of a protein. Statistical analysis
of mutations distributed throughout several enzymes has
been used to identify the contributions of those changes
to function of the protein [36] and to predict the sequence with best function [37]. Mathematical sequence–
activity modeling has thus been validated at many scales
of complexity: from small molecules to peptides to localized regions of proteins to changes spread throughout
entire proteins.
Although there is a growing body of work in which
sequence–activity relationships are used to design improved peptides [5,6,38,39], application of the same
methods to protein/biocatalyst engineering is still in its
infancy. One reason for this has been the difficulty in
producing large numbers of modified molecules [40]; in
contrast to peptides, proteins cannot easily be synthesized
directly. As technology improves, the synthesis of individually designed genes becomes increasingly costeffective [41,42]. Testing variants taken from libraries
that are even cheaper to produce is also likely to produce
useful sequence–activity relationships [43].
Experimental design of maximally
informative datasets
Another useful statistical tool with its origins in other
engineering disciplines is that of experimental design.
This is a technique by which a variant set is designed to
contain the maximum amount of information for subsequent analysis of sequence–activity data [44]. Using
D-optimal design, Mee et al. [45] designed, synthesized
and tested a training set of 60 analogs of a 15 amino acid
antibacterial peptide. A regression-based model derived
from the sequence–activity correlation of the 60 datapoints was used to design and synthesize 39 new peptides
predicted to have improved activity. The best designed
peptide was twice as potent as the best one in the training
set. In their selection of acetylcholinesterase variants,
Bucht et al. [35] also used experimental design to choose
the eight gene variants that would best represent the
sequence variation they were exploring.
Accounting for amino acid interactions
If an amino acid change at one position affects the
functional consequences of changing other amino acids
in a protein, predictive sequence–function models must
account for this. A model that incorporates amino acid
interactions requires more data than one that assumes that
the amino acids act to achieve the same quality of model
[40,46]. In studies of antigen–antibody binding [40]
Current Opinion in Biotechnology 2003, 14:366–370
and ligand–receptor binding [47], researchers found
that very few interaction terms (and thus very little
additional data) were needed to produce accurate descriptions of the sequence–activity relationship.
Recent work from Husimi’s group suggests that this result
is also true for proteins. Individual amino acid changes
contributing to specific properties of dihydrofolate reductase [36], thermolysin and prolyl endopeptidase [37] are
approximately independent. Of particular interest is a
recent study in which only two of 14 randomly generated
mutations that increased prolyl endopeptidase thermostability appeared to be interdependent. The authors’
model contained a single interaction term to account for
this residue pair. A gene variant containing the pair predicted to interact was synthesized and tested; its activity
was shown be as predicted by the model. Only 45 gene
variants were needed to accurately model the activities of
16 384 possible sequence combinations [46].
Heuristic methods are becoming more
widespread
Other successful examples of heuristic approaches to
analyze and optimize biological systems include the
optimization of peptidase I using neural networks [48],
calculations of individual amino acid contributions to
serine protease inhibitor activity [49], PLS-based prediction of the determinants of protein localization [50,51],
and protein contact map and interaction site prediction
using neural networks [52]. In work complementing
modeling to assess the contributions of small numbers
of changes at many positions, sequence–activity relationships have been derived using PLS to quantitate the
effects of multiple amino acid substitutions at single
positions in haloalkane dehalogenase, T4 lysozyme, subtilisin and tryptophan synthase. These methods have also
been used to determine the physicochemical properties
required at identified positions to confer specific enzyme
properties [53]. Furthermore, the same tools have been
used to systematically characterize the substrates for a set
of haloalkane dehalogenase variants to determine the
effects of amino acid changes on substrate specificity of
the enzyme [54].
Conclusions: drivers for change
By casting the protein engineering problem as an optimization problem common to other engineering disciplines, we are able to exploit many different problem
solving algorithms. Gone are the technological barriers
to synthesizing statistically representative datasets. As
Wold predicted in 1986, the capture of protein sequence–
activity relationships now permits the design of optimized
proteins.
There are several drivers for applying modern engineering tools to protein engineering. Firstly, the human genome project, microarrays and other recent large scientific
www.current-opinion.com
Bioinformatic approaches to catalyst design Gustafsson, Govindarajan and Minshull 369
endeavours have changed biology from a ‘one variable at a
time’ science to a science engulfed in variables. Secondly,
statistical tools developed and deployed in a variety of
engineering areas can now be operated by non-statisticians
from any desktop computer. Finally, the cost of generating
and sequencing statistically representative sets of genes is
continuously decreasing.
11. Lehmann M, Lopez-Ulibarri R, Loch C, Viarouge C, Wyss M,
van Loon AP: Exchanging the active site between phytases for
altering the functional properties of the enzyme. Protein Sci
2000, 9:1866-1872.
Demonstration that residues identified as functionally important (in this
case the entire active site) can be moved from one protein backbone to
another, leading to functionally novel catalysts.
It is striking that by measuring the contribution of amino
acid variations to the function of a protein, sequence–
activity modeling requires orders of magnitude fewer
variants to be tested to design improved sequences than
the numbers screened using widespread directed evolution techniques. This is important, because methodologies
that rely upon screening large sample sets are vulnerable
to the weakness that high-throughput screens often turn
out to have limited ability to measure the protein properties that are really important [2,19,40]. Heuristic methodologies may therefore permit protein engineers to test
fewer variants under conditions that more closely approximate their final intended applications and reduce the time
and resources that are often spent in building and implementing imprecise high-throughput screens.
13. Kwasigroch JM, Gilis D, Dehouck Y, Rooman M: PoPMuSiC,
rationally designing point mutations in protein structures.
Bioinformatics 2002, 18:1701-1702.
Acknowledgements
One of us (CG) began this manuscript while employed at Maxygen Inc.
We thank Maxygen for their support.
References and recommended reading
Papers of particular interest, published within the annual period of
review, have been highlighted as:
of special interest
of outstanding interest
1.
Tobin MB, Gustafsson C, Huisman GW: Directed evolution: the
‘rational’ basis for ‘irrational’ design. Curr Opin Struct Biol 2000,
10:421-427.
2.
van Regenmortel MH: Are there two distinct research strategies
for developing biologically active molecules: rational design
and empirical selection? J Mol Recognit 2000, 13:1-4.
3.
Ryu DD, Nam DH: Recent progress in biomolecular engineering.
Biotechnol Prog 2000, 16:2-16.
4.
Nambiar KP, Stackhouse J, Stauffer DM, Kennedy WP, Eldredge
JK, Benner SA: Total synthesis and cloning of a gene coding for
the ribonuclease S protein. Science 1984, 223:1299-1301.
5.
Hellberg S: A Multivariate Approach to QSAR. PhD thesis. Umea,
Sweden: University of Umea: 1986.
6.
Hellberg S, Sjostrom M, Skagerberg B, Wold S: Peptide
quantitative structure-activity relationships, a multivariate
approach. J Med Chem 1987, 30:1126-1135.
7.
Gustafsson C, Govindarajan S, Emig R: Exploration of sequence
space for protein engineering. J Mol Recognit 2001, 14:308-314.
8.
Sandberg M, Eriksson L, Jonsson J, Sjostrom M, Wold S: New
chemical descriptors relevant for the design of biologically
active peptides. A multivariate characterization of 87 amino
acids. J Med Chem 1998, 41:2481-2491.
9.
Lin Z, Thorsen T, Arnold FH: Functional expression of
horseradish peroxidase in E. coli by directed evolution.
Biotechnol Prog 1999, 15:467-471.
10. Glieder A, Farinas ET, Arnold FH: Laboratory evolution of a
soluble, self-sufficient, highly active alkane hydroxylase.
Nat Biotechnol 2002, 20:1135-1139.
www.current-opinion.com
12. Looger LL, Dwyer MA, Smith JJ, Hellinga HW: Computational
design of receptor and sensor proteins with novel functions.
Nature 2003, 423:185-190.
14. Tangri S, LiCalsi C, Sidney J, Sette A: Rationally engineered
proteins or antibodies with absent or reduced immunogenicity.
Curr Med Chem 2002, 9:2191-2199.
15. Pierce NA, Winfree E: Protein design is NP-hard. Protein Eng
2002, 15:779-782.
16. Lathrop RH: The protein threading problem with sequence
amino acid interaction preferences is NP-complete. Protein Eng
1994, 7:1059-1068.
17. Hanes J, Pluckthun A: In vitro selection and evolution of
functional proteins by using ribosome display. Proc Natl Acad
Sci USA 1997, 94:4937-4942.
18. Wells JA, Lowman HB: Rapid evolution of peptide and protein
binding properties in vitro. Curr Opin Biotechnol 1992,
3:355-362.
19. Ness JE, del Cardayre SB, Minshull J, Stemmer WP: Molecular
breeding: the natural approach to protein design. Adv Protein
Chem 2000, 55:261-292.
20. Johnson DS, McGeoch LA: The traveling salesman problem: a
case study in local optimization. In Local Search in Combinatorial
Optimization. Edited by Aarts EHL, Lenstra JK, Aarts EL: John Wiley
& Sons Ltd; 1997:215-310.
21. Casari G, Sander C, Valencia A: A method to predict functional
residues in proteins. Nat Struct Biol 1995, 2:171-178.
22. del Sol Mesa A, Pazos F, Valencia A: Automatic methods for
predicting functionally important residues. J Mol Biol 2003,
326:1289-1302.
Excellent comparison of methods available to identify residues that
contribute to protein function.
23. Gogos A, Jantz D, Senturker S, Richardson D, Dizdaroglu M,
Clarke ND: Assignment of enzyme substrate specificity by
principal component analysis of aligned protein sequences: an
experimental test using DNA glycosylase homologs.
Proteins 2000, 40:98-105.
Principal component analysis of small numbers of proteins used to
identify residues likely to be involved in substrate specificity determination.
24. Suzuki Y, Gojobori T: A method for detecting positive
selection at single amino acid sites. Mol Biol Evol 1999,
16:1315-1328.
25. Jonsson J, Norberg T, Carlsson L, Gustafsson C, Wold S:
Quantitative sequence-activity models (QSAM) — tools for
sequence design. Nucleic Acids Res 1993, 21:733-739.
26. Govindarajan S, Ness JE, Kim S, Mundorff EC, Minshull J,
Gustafsson C: Systematic variation of amino acid substitutions
for stringent assessment of pairwise covariation. J Mol Biol
2003, 328:1061-1069.
Fifty-two phylogenetically identified substitutions in subtilisins are
accepted into one enzyme backbone, modifying its activity. Most natural
changes that occur together are shown to be a result of descent from a
common ancestor and not a result of functional constraints.
27. Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF,
Pasamontes L, van Loon AP, Wyss M: The consensus concept for
thermostability engineering of proteins: further proof of
concept. Protein Eng 2002, 15:403-411.
28. Benner SA, Cohen MA, Gonnet GH: Amino acid substitution
during functionally constrained divergent evolution of protein
sequences. Protein Eng 1994, 7:1323-1332.
Current Opinion in Biotechnology 2003, 14:366–370
370 Protein technologies and commercial enzymes
29. Wu TD, Brutlag DL: Discovering empirically conserved amino
acid substitution groups in databases of protein families.
Proc Int Conf Intell Syst Mol Biol 1996, 4:230-240.
30. Adenot M, Sarrauste de Menthiere C, Chavanieu A, Calas B,
Grassy G: Peptides quantitative structure-function
relationships: an automated mutation strategy to design
peptides and pseudopeptides from substitution matrices. J Mol
Graph Model 1999, 17:292-309.
31. Dimmic MW, Rest JS, Mindell DP, Goldstein RA: rtREV: an amino
acid substitution matrix for inference of retrovirus and reverse
transcriptase phylogeny. J Mol Evol 2002, 55:65-73.
A substitution matrix for maximum likelihood phylogenetic analysis is
developed that is optimized on a subset of sequences. Substitution
matrices are unique for each sequence subset.
32. Norinder U, Rivera C, Unden A: A quantitative structure-activity
relationship study of some substance P-related peptides. A
multivariate approach using PLS and variable selection.
J Pept Res 1997, 49:155-162.
33. Sandberg M: Deciphering Sequence Data, a Multivariate Approach.
PhD thesis. Umea: Umea University: 1997.
34. Geladi P, Kowalski BR: Partial least squares regression: a
tutorial. Anal Chim Acta 1986, 186:1-17.
35. Bucht G, Wikstrom P, Hjalmarsson K: Optimising the signal
peptide for glycosyl phosphatidylinositol modification of
human acetylcholinesterase using mutational analysis and
peptide-quantitative structure-activity relationships.
Biochim Biophys Acta 1999, 1431:471-482.
PLS and experimental design are used to optimize acetylcholinesterase,
increasing its surface expression on cells threefold.
36. Aita T, Iwakura M, Husimi Y: A cross-section of the fitness
landscape of dihydrofolate reductase. Protein Eng 2001,
14:633-638.
37. Aita T, Uchiyama H, Inaoka T, Nakajima M, Kokubo T, Husimi Y:
Analysis of a local fitness landscape with a model of the rough
Mt. Fuji-type landscape: application to prolyl endopeptidase
and thermolysin. Biopolymers 2000, 54:64-79.
38. Strom MB, Haug BE, Rekdal O, Skar ML, Stensen W, Svendsen JS:
Important structural features of 15-residue lactoferricin
derivatives and methods for improvement of antimicrobial
activity. Biochem Cell Biol 2002, 80:65-74.
39. Eriksson L, Jonsson J, Hellberg S, Lindgren F, Skagerberg B,
Sjostrom M, Wold S: Peptide QSAR on substance P analogues,
enkephalins and bradykinins containing L- and D-amino acids.
Acta Chem Scand A 1990, 44:50-55.
40. Choulier L, Andersson K, Hamalainen MD, van Regenmortel MH,
Malmqvist M, Altschuh D: QSAR studies applied to the prediction
of antigen-antibody interaction kinetics as measured by
BIAcore. Protein Eng 2002, 15:373-382.
Multivariate analysis applied to sequence optimization and reaction
conditions.
41. Hoover DM, Lubkowski J: DNAWorks: an automated method for
designing oligonucleotides for PCR-based gene synthesis.
Nucleic Acids Res 2002, 30:e43.
The shape of things to come. Gene synthesis gets cheaper and easier.
42. Holowachuk EW, Ruhoff MS: Efficient gene synthesis by Klenow
assembly/extension-Pfu polymerase amplification (KAPPA) of
overlapping oligonucleotides. PCR Methods Appl 1995,
4:299-302.
Current Opinion in Biotechnology 2003, 14:366–370
43. Abecassis V, Pompon D, Truan G: High efficiency family shuffling
based on multi-step PCR and in vivo DNA recombination in
yeast: statistical and functional analysis of a combinatorial
library between human cytochrome P450 1A1 and 1A2.
Nucleic Acids Res 2000, 28:E88.
One of many library synthesis methods. Interesting analysis of variants in
which hybridization signals instead of known sequence changes are used
as input variables for modeling.
44. Hellberg S, Eriksson L, Jonsson J, Lindgren F, Sjöström M,
Skagerberg B, Wold S, Andrews P: Minimum analogue peptide
sets (MAPS) for quantitative structure-activity relationships.
Int J Pept Protein Res 1991, 37:414-424.
45. Mee RP, Auton TR, Morgan PJ: Design of active analogues of a
15-residue peptide using D-optimal design, QSAR and a
combinatorial search algorithm. J Pept Res 1997, 49:89-102.
46. Aita T, Hamamatsu N, Nomiya Y, Uchiyama H, Shibanaka Y,
Husimi Y: Surveying a local fitness landscape of a protein with
epistatic sites for the study of directed evolution. Biopolymers
2002, 64:95-105.
A model of only 45 prolyl endopeptidase variants accurately predicts the
activities of combinations of 14 different mutations. Only one interaction
term in required in the model.
47. Prusis P, Lundstedt T, Wikberg JE: Proteo-chemometrics
analysis of MSH peptide binding to melanocortin receptors.
Protein Eng 2002, 15:305-311.
Statistically representative sets of melanocortin peptide and chimeric
receptors were analyzed. Models incorporated linear and interaction
terms; predictions were externally validated.
48. Schneider G, Schrödl W, Wallukat G, Muller J, Nissen E,
Rönspeck W, Wrede P, Kunze R: Peptide design by artificial
neural networks and computer-based evolutionary search.
Proc Natl Acad Sci USA 1998, 95:12179-12184.
49. Lu SM, Lu W, Qasim MA, Anderson S, Apostol I, Ardelt W, Bigler T,
Chiang YW, Cook J, James MN et al.: Predicting the reactivity of
proteins from their sequence alone: Kazal family of protein
inhibitors of serine proteinases. Proc Natl Acad Sci USA 2001,
98:1410-1415.
The conclusion of an heroic 20 year study. By synthesizing and testing
<200 variants, activities of many natural proteinases can be accurately
predicted.
50. Sjöström M, Wold S, Wieslander A, Rilfors L: Signal peptide amino
acid sequences in Escherichia coli contain information related
to final protein localization. A multivariate data analysis.
EMBO 1987, 6:823-831.
51. Schein AI, Kissinger JC, Ungar LH: Chloroplast transit peptide
prediction: a peek inside the black box. Nucleic Acids Res 2001,
29:E82.
52. Fariselli P, Pazos F, Valencia A, Casadio R: Prediction of protein–
protein interaction sites in heterocomplexes with neural
networks. Eur J Biochem 2002, 269:1356-1361.
53. Damborsky J: Quantitative structure-function and structurestability relationships of purposely modified proteins.
Protein Eng 1998, 11:21-30.
54. Marvanova S, Nagata Y, Wimmerova M, Sykorova J, Hynkova K,
Damborsky J: Biochemical characterization of broadspecificity enzymes using multivariate experimental design
and a colorimetric microplate assay: characterization of the
haloalkane dehalogenase mutants. J Microbiol Methods 2001,
44:149-157.
www.current-opinion.com
Descargar