Subscriptions     Archives     Contact Us     Home     Advertising

ScienceWeek
Crossing Barriers Since 1997

    Receive free new report announcements by Email: ScienceWeek TOC Alerts


About ScienceWeek

Archives

Contact Us

Subscriptions

 


ScienceWeek

GENOME BIOLOGY: ON LOGIC AND NETWORKS OF PROTEIN INTERACTIONS

The following points are made by P.M. Bowers et al (Science 2004 306:2246):

1) The sequencing of multiple genomes from diverse species has tremendous potential to impact our understanding of biology, both by providing a census of all proteins and by enabling subsequent analysis of their functions [1-5] Various patterns across multiple complete genomes have been used to infer biological interactions and functional linkages between proteins. These include observations of two distinct proteins from one organism being genetically fused into a single protein in another organism and the tendency of two proteins to occur in chromosomal proximity across multiple organisms.

2) When a sufficiently large number of genomes were fully sequenced, it became possible with the phylogenetic profile approach to detect functional relationships between proteins exhibiting statistically similar patterns of presence or absence. Because sequenced genomes allow us to catalog all of the proteins encoded in each organism, we can determine the pattern describing a protein's presence or absence by searching for its homologs across N organisms, the result of which is an N-dimensional vector of ones (present) and zeros (not present) referred to as its "phylogenetic profile".

3) Original implementations of the phylogenetic profile method sought to infer "links" between pairs of proteins with similar profiles. A subsequent variation on that idea linked proteins if their profiles represented the negation of each other. These ideas are consistent with the simplest notion of how two proteins might be related in a cell, with the presence of one protein implying the presence or absence of another. Such simple patterns might be expected when two proteins are required to form a structural complex or when two proteins carry out sequential steps in an unbranched metabolic pathway. However, such simple relationships cannot adequately describe the full complexity of cellular networks that involve branching, parallel, and alternate pathways.

4) The observed complexity of cellular networks leads one to expect the existence of higher order logic relationships involving a pattern of presence or absence of multiple proteins. Furthermore, evolutionary divergence, convergence, and horizontal transfer events lead us to expect relationships between multiple gene families that are more complex than can be described by pairwise phylogenetic similarity. Analysis of cellular pathways and networks in terms of logic relations has attracted recent interest, and the growing number of sequenced genomes now makes it possible to search for logic relations.

5) In summary: A major focus of genome research is to decipher the networks of molecular interactions that underlie cellular function. The authors describe a computational approach for identifying detailed relationships between proteins on the basis of genomic data. Logic analysis of phylogenetic profiles identifies triplets of proteins whose presence or absence obey certain logic relationships. For example, protein C may be present in a genome only if proteins A and B are both present. The method reveals many previously unidentified higher order relationships. These relationships illustrate the complexities that arise in cellular networks because of branching and alternate pathways, and they also facilitate assignment of cellular functions to uncharacterized proteins.

References (abridged):

1. S. Li et al., Science 303, 540 (2004)

2. M. Strong, P. Mallick, M. Pellegrini, M. J. Thompson, D. Eisenberg, Genome Biol. 4, R59 (2003)

3. L. Giot et al., Science 302, 1727 (2003)

4. P. M. Bowers et al., Genome Biol. 5, R35 (2004)

5. C. von Mering et al., Nucleic Acids Res. 31, 258 (2003)

Science http://www.sciencemag.org

--------------------------------

Related Material:

GENOMIC BIOLOGY: ON THE YEAST PROTEOME

The following points are made by J.A. Wohlschlegel and J.R. Yates (Nature 2003 425:671):

1) Saccharomyces cerevisiae was the first eukaryote -- the type of organism characterized by a nucleus and membrane-bound organelles, which also includes humans -- to have its genome sequenced(3). Work with this organism has since led the way in functional genomics. Experiments pioneered in yeast have set the standard for the global analysis of cellular processes and paved the way for similar approaches in other organisms. They have also generated genome-wide collections of reagents that have been tremendously valuable.

2) Open reading frames (ORFs) are commonly the center of attention in genome biology. These are stretches of DNA that have the characteristics of protein-coding capacity; that is, they may be genes. Collections of yeast strains now exist in which the expected ORFs have been either deleted or fused to various protein tags(4,5). Arrays have been created by using yeast strains expressing proteins that carry so-called affinity tags, allowing large numbers of proteins to be rapidly purified, then immobilized on a solid support. Large-scale studies involving various techniques -- protein arrays, and yeast two-hybrid or co-immunoprecipitation assays -- have revealed the identities of proteins that interact with individual proteins, large macromolecular complexes, or even specific small molecules.

3) All in all, yeast biologists have led the charge in developing approaches to understanding eukaryotic genomes. Huh et al(1) and Ghaemmaghami et al(2) continue that tradition. Their goal was to tag and study the gene products of all recognized ORFs in the yeast genome. A key component of these studies was the tagging method used: artificially altering a protein's expression level can lead to results, such as mislocalization, that do not reflect its characteristics when it is expressed normally.

4) In technical terms, Huh et al and Ghaemmaghami et al used homologous recombination to integrate a DNA sequence, encoding either a tandem affinity purification tag (TAP) or green fluorescent protein (GFP), in-frame with the 3'-end of the coding sequence of each gene in its original chromosomal location. Because a gene's promoter and upstream regulatory sequences are not affected in this approach, it is likely that the behavior of these fusion genes is nearly identical to that of their normal counterparts.

5) These studies have achieved three major results: First, we now have data on protein abundance and localization for 75% of the predicted yeast ORFs. Second, we have a value for the number of proteins present in a yeast cell during normal growth. Previously, a fun game to play with yeast biologists was to ask how many proteins they thought should be present under a given set of conditions. Numbers ranged between 2500 and 5000. It appears that the higher number was correct. Last -- and most important -- reagents have been developed for tracking a large majority of yeast genes while keeping them under native regulatory control. The reagents will be tools for further studies.

References (abridged):

1. Huh, W-K et al. Nature 425, 686-691 (2003)

2. Ghaemmaghami, S. et al. Nature 425, 737-741 (2003)

3. Goffeau, A. et al. Science 274, 546, 563-567 (1996)

4. Winzeler, E. A. et al. Science 285, 901-906 (1999)

5. Martzen, M. R. et al. Science 286, 1153-1155 (1999)

Nature http://www.nature.com/nature

--------------------------------

Related Material:

PLANT BIOLOGY: ON THE CHLOROPLAST PROTEOME

The following points are made by W. Christopher et al (Current Biology 2004 14:354):

1) Chloroplasts are typical plant cell organelles that develop and differentiate from proplastids in a tissue-specific and signal-dependent manner. They are of central importance for cellular metabolism and have many unique roles in processes of global significance, including photosynthesis and amino acid biosynthesis. Chloroplasts are of cyanobacterial origin, but during evolution they lost their autonomy and transferred most of their genes to the cell nucleus [1].

2) To date, only limited information is available on the proteome that constitutes the chloroplast and its metabolic functions. First attempts to estimate the protein complement of Arabidopsis thaliana plastids by using prediction tools such as TargetP or ChloroP [2,3] combined with a genome-wide search for genes of cyanobacterial origin [4-5] resulted in more than 3000 candidate proteins. Computer-assisted predictions based on transit peptides are unlikely to reveal the full chloroplast proteome, however, because import pathways of currently unknown mechanisms might exist that are not recognizable by available prediction algorithms. Several known mitochondrial and chloroplast proteins have already been identified that do not follow the canonical import pathways.

3) Proteomics is a powerful tool to reveal the protein complement of cell organelles and to obtain new insights into intracellular protein sorting and biochemical pathways. Progress has been made for the proteome analysis of plant mitochondria, peroxisomes, amyloplasts, and chloroplasts, but most of these studies focused on specific organelle compartments. Protein identification from the chloroplast thylakoid lumen and envelope has greatly improved the prediction of suborganelle protein localization.

4) In summary: By tandem mass spectrometry, the authors identified 690 different proteins from purified Arabidopsis chloroplasts. Most proteins could be assigned to known protein complexes and metabolic pathways, but more than 30% of the proteins have unknown functions, and many are not predicted to localize to the chloroplast. Novel structure and function prediction methods provided more informative annotations for proteins of unknown functions. While near-complete protein coverage was accomplished for key chloroplast pathways such as carbon fixation and photosynthesis, fewer proteins were identified from pathways that are downregulated in the light. Parallel RNA profiling revealed a pathway-dependent correlation between transcript and relative protein abundance, suggesting gene regulation at different levels. The authors conclude: The chloroplast proteome contains many proteins that are of unknown function and not predicted to localize to the chloroplast. Expression of nuclear-encoded chloroplast genes is regulated at multiple levels in a pathway-dependent context. The combined shotgun proteomics and RNA profiling approach is of high potential value to predict metabolic pathway prevalence and to define regulatory levels of gene expression on a pathway scale.

References (abridged):

1 Martin, W. and Herrmann, R.G. (1998). Gene transfer from organelles to the nucleus: how much, what happens, and why?. Plant Physiol. 118, 9-17

2 Emanuelsson, O., Nielsen, H., Brunak, S., and von Heijne, G. (2000). Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005-1016

3 Emanuelsson, O., Nielsen, H., and von Heijne, G. (1999). ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci. 8, 978-984

4 Abdallah, F., Salamini, F., and Leister, D. (2000). A prediction of the size and evolutionary origin of the proteome of chloroplasts of Arabidopsis. Trends Plant Sci. 5, 141-142

5 Martin, W., Rujan, T., Richly, E., Hansen, A., Cornelsen, S., Lins, T., Leister, D., Stoebe, B., Hasegawa, M., and Penny, D. (2002). Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl. Acad. Sci. USA 99, 12246-12251

Current Biology http://www.current-biology.com

ScienceWeek http://scienceweek.com

Copyright © 2005 ScienceWeek
All Rights Reserved
US Library of Congress ISSN 1529-1472