用户名: 密码: 验证码:
基于蛋白质网络的关键蛋白质识别方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
关键蛋白质是生物体生存和繁殖所必需的蛋白质,在生命活动中扮演重要角色。关键蛋白质的识别对于生命科学的研究具有重要意义,在疾病诊治和药物设计等方面也具有重要的应用价值。在后基因组时代,随着高通量技术的发展,可获得的蛋白质相互作用数据日益丰富,基于蛋白质网络的关键蛋白质识别成为新的研究热点。
     本文从网络拓扑的角度出发,在分析节点拓扑特征的基础上,深入挖掘了蛋白质网络的特征,设计了有效的关键蛋白质识别方法。主要研究工作包括:
     针对目前以中心性测度为主的基于拓扑的关键蛋白质识别方法只能反映节点特征而无法表征边的重要程度这一不足,引入边聚集系数的概念,构造了一个融合网络中点和边双重特性的测度参数SoECC,并用于关键蛋白质的识别。在酵母蛋白质相互作用网络上的实验结果表明,SoECC的预测准确率和效率普遍高于六种中心性测度,并且SoECC预测出的关键蛋白质表现出明显的聚集效应,这种现象是边聚集系数涵义的体现,也与先前研究者的结论相吻合。
     针对现有的关键蛋白质识别方法对生物意义及生物功能的挖掘不够深入这一缺点,引入蛋白质复合物的信息,构造了一个新的测度参数SoID来识别关键蛋白质。实验结果表明,SoID预测的关键蛋白质数量普遍多于六种中心性测度的预测结果,在敏感度、特异性等指标上也具有一定优势,并且SoID能够有效识别低度关键蛋白质。
     针对目前能够获得的蛋白质相互作用数据中包含大量的假阳性这一事实,提出了一种新的相互作用加权方法,在加权网络中使用六个经典的中心性测度来预测关键蛋白质。实验结果表明,任何一种中心性测度在加权蛋白质网络上预测的准确率和效率都普遍高于在相应的非加权蛋白质网络上的预测结果。基于网络拓扑的关键蛋白质识别方法的准确性在很大程度上受网络可靠性和数据真实性的影响,对网络加权可以提升关键蛋白质的预测性能。
     本文提出的几个关键蛋白质识别方法,通过引入多种信息,有效地提高了识别准确度,为关键蛋白质的识别研究提供了新的思路。
Essential proteins are those proteins which are indispensable to the viability and reproduction of an organism. They play an important role in cell activities. Identification of essential proteins is significant not only for the research of life science, but for practical purposes, such as diagnosis and treatment for diseases and drug design. With the development of high-throughput technology in the post-genomic era, a wealth of protein-protein interaction data have been produced. Consequently, identifying essential proteins based on protein interaction networks becomes a hot topic.
     This paper proceeds from network topology, explores the characteristics of protein interaction networks on the basis of analysis of topological characteristics of nodes, and designs efficient methods for identifying essential proteins. The main original works include:
     The current methods for identifying essential proteins based on topology, such as centrality measures, only indicate the features of nodes in the network but can not characterize the importance of edges. In view of this, we propose a novel method based on edge clustering coefficient, named as SoECC, which binds characteristics of edges and nodes effectively. The experimental results on yeast protein interaction network show that, both accuracy and efficiency of SoECC are universally higher than that of the six centrality measures. Besides, we find that essential proteins identified by SoECC show obvious cluster effect. It is a significant phenomenon which agreed with previous researches.
     The existing methods for identifying essential proteins mostly ignore the biological significance and function of proteins. Aiming at this drawback, we introduce protein complexes into our research and construct a new measure SoID for identifying essential proteins. The experimental results indicate that, comparing with the six conventional centrality measures, SoID has a certain advantage in sensitivity and specificity. The essential proteins detected by SoID are also universally more than that detected by the six centrality measures. Besides, SoID can effectively discover the low-connectivity essential proteins.
     In consideration of the fact that there exist a lot of false positives in currently available protein interaction datasets, we propose a new method for weighting the interactions and predict essential proteins using the six classic centrality measures in the weighted protein interaction network. The experimental results show that, the accuracy and efficiency of any centrality measure in weighted protein interaction network are universally higher than that in the corresponding unweighted protein interaction network. The accuracy of identification methods based on network topology is heavily affected by reliability of networks and reality of datasets. Weighting the protein interaction networks can improve the performance of identification of essential proteins.
     The several methods proposed in this paper improve the accuracy of identification of essential proteins effectively. Moreover, by means of employing various information, this paper provides a new idea for identification of essential proteins.
引文
[1]Blackstock W P, Weir M P. Proteomics:quantitative and physical mapping of cellular proteins. TRENDS in Biotechnology,1999,17(3):121~127
    [2]Naaby-Hansen S, Waterfield M D, Cramer R. Proteomics -- post-genomic cartography to understand gene function. TRENDS in Pharmacological Sciences, 2001,22(7):376~384
    [3]Graves P R, Haystead T A. Molecular biologist's guide to proteomics. Microbiology and Molecular Biology Reviews,2002,66(1):39-63
    [4]Asur S, Ucar D, Parthasarathy S. An ensemble framework for clustering protein-protein interaction networks. Bioinformatics,2007,23(13):i29-i40
    [5]Winzeler E A, Shoemaker D D, Astromoff A, et al. Functional Characterization of the S. cerevisiae Genome by Gene Deletion and Parallel Analysis. Science,1999, 285(5429):901~906
    [6]Cullen L M, Arndt G M. Genome-wide screening for gene function using RNAi in mammalian cells. Immunology and Cell Biology,2005,83(3):217~223
    [7]Giaever G., Chu A M, Ni L, et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature,2002,418(6896):387~391
    [8]Roemer T, Jiang B, Davison J, et al. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Molecular Microbiology,2003,50(1):167~181
    [9]Eisenberg D, Marcotte E M, Xenarios I, et al. Protein function in the post-genomic era. Nature,2000,405(6788):823~826
    [10]Gavin A C, Superti-Furga G. Protein complexes and proteome organization from yeast to man. Current Opinion in Chemical Biology,2003,7(1):21~27
    [11]Bauer A, Kuster B. Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes. European Journal of Biochemistry, 2003,270(4):570~578
    [12]Legrain P, Wojcik J, Gauthier J M. Protein-protein interaction maps: a lead towards cellular functions. TRENDS in Genetics,2001,17(6):346-352
    [13]Bu D, Zhao Y, Cai L, et al. Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Research,2003,31(9): 2443~2450
    [14]Fields S, Song O. A novel genetic system to detect protein-protein interactions. Nature,1989,340(6230):245~246
    [15]Uetz P, Giot L, Cagney G, et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature,2000,403(6770):623~627
    [16]Ito T, Chiba T, Ozawa R, et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS,2001,98(8):4569~4574
    [17]Rigaut G, Shevchenko A, Rutz B, et al. A generic protein purification method for protein complex characterization and proteome exploration. Nature Biotechnology,1999,17(10):1030~1032
    [18]Puig O, Caspary F, Rigaut G, et al. The Tandem Affinity Purification (TAP) Method: A General Procedure of Protein Complex Purification. Methods,2001, 24(3):218~229
    [19]Gavin A C, Bosche M, Krause R, et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature,2002,415(6868): 141~147
    [20]Ho Y, Gruhler A, Heilbut A, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature,2002,415(6868): 180~183
    [21]Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature,2003, 422(6928):198~207
    [22]Zhu H, Bilgin M, Bangham R, et al. Global Analysis of Protein Activities Using Proteome Chips. Science,2001,293(5537):2101~2105
    [23]Tong A H, Drees B, Nardelli G, et al. A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules. Science,2002,295(5553):321~324
    [24]Jeong H, Oltvai Z N, Barabasi A L. Prediction of Protein Essentiality Based on Genomic Data. ComPlexUs,2003,1(1):19~28
    [25]Butland G, Peregrin-Alvarez J M, Li J, et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature,2005, 433(7025):531~537
    [26]Jeong H, Mason S P, Barabasi A L, et al. Lethality and centrality in protein networks. Nature,2001,411(6833):41~42
    [27]Maslov S, Sneppen K. Specificity and Stability in Topology of Protein Networks. Science,2002,296(5569):910~913
    [28]Przulj N, Wigle D A, Jurisica I. Functional topology in a network of protein interactions. Bioinformatics,2004,20(3):340~348
    [29]da Silva J P M, Acencio M L, Mombach J C M, et al. In silico network topology-based prediction of gene essentiality. Physica A: Statistical Mechanics and its Applications,2008,387(4):1049~1055
    [30]Watts D J, Strogatz S H. Collective dynamics of'small-world' networks. Nature, 1998,393(6684):440~442
    [31]Wagner A, Fell D A. The small world inside large metabolic networks. Proceedings. Biological sciences/The Royal Society,2001,268(1478): 1803~1810
    [32]del Sol A, Fujihashi H, O' Meara P. Topology of small-world networks of protein-protein complex structures. Bioinformatics,2005,21(8):1311~1315
    [33]Barabasi A L, Albert R. Emergence of Scaling in Random Networks Science, 1999,286(5439):509~512
    [34]Wuchty S. Scale-Free Behavior in Protein Domain Networks. Mollecular Biology and Evolution,2001,18(9):1694~1702
    [35]Albert R. Scale-free networks in cell biology. Journal of Cell Science,2005, 118(Pt 21):4947~4957
    [36]汪小帆,李翔,陈关荣.复杂网络理论及其应用.北京:清华大学出版社,2006
    [37]Yu H, Greenbaum D, Xin Lu H, et al. Genomic analysis of essentiality within protein networks. TRENDS in Genetics,2004,20(6):227~231
    [38]Lin C C, Juan H F, Hsiang J T, et al. Essential Core of Protein-Protein Interaction Network in Escherichia coli. Journal of Proteome Research,2009,8(4): 1925~1931
    [39]Hahn M W, Kern A D. Comparative Genomics of Centrality and Essentiality in Three Eukaryotic Protein-Interaction Networks. Molecular Biology and Evolution,2005,22(4):803~806
    [40]Liang H, Li W H. Gene essentiality, gene duplicability and protein connectivity in human and mouse. TRENDS in Genetics,2007,23(8):375~378
    [41]Lin C Y, Chin C H, Wu H H, et al. Hubba: hub objects analyzer--a framework of interactome hubs identification for network biology. Nucleic Acids Research, 2008,36(Web Server issue):W438-W443
    [42]Vallabhajosyula R R, Chakravarti D, Lutfeali S, et al. Identifying Hubs in Protein Interaction Networks. PLoS One,2009,4(4):e5344
    [43]Pang K, Sheng H, Ma X. Understanding gene essentiality by finely characterizing hubs in the yeast protein interaction network. Biochemical and Biophysical Research Communications,2010,401(1):112~116
    [44]Ning K, Ng H K, Srihari S, et al. Examination of the relationship between essential genes in PPI network and hub proteins in reverse nearest neighbor topology. BMC Bioinformatics,2010,11:505
    [45]He X, Zhang J. Why Do Hubs Tend to Be Essential in Protein Networks?. PLoS Genetics,2006,2(6):e88
    [46]Zotenko E, Mestre J, O' Leary D P, et al. Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality. PLoS Computational Biology,2008,4(8): e1000140
    [47]Joy M P, Brock A, Ingber D E, et al. High-Betweenness Proteins in the Yeast Protein Interaction Network. Journal of Biomedicine and Biotechnology,2005, 2005(2):96~103
    [48]Manimaran P, Hegde S R, Mande S C. Prediction of conditional gene essentiality through graph theoretical analysis of genome-wide functional linkages. Molecular bioSystems,2009,5(12):1936-1942
    [49]Rives A W, Galitski T. Modular organization of cellular networks. PNAS,2003, 100(3):1128~1133
    [50]Barabasi A L, Oltvai Z N. Network biology: understanding the cell's functional organization. Nature Reviews Genetics,2004,5(2):101~113
    [51]Han J D J, Bertin N, Hao T, Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature,2004,430(6995):88~93
    [52]Gavin A C, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature,2006,440(7084):631~636
    [53]Luo F, Yang Y, Chen C F, et al. Modular organization of protein interaction networks. Bioinformatics,2007,23(2):207~214
    [54]Spirin V, Mirny L A. Protein complexes and functional modules in molecular networks. PNAS,2003,100(21):12123~12128
    [55]Krogan N J, Cagney G, Yu H, et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature,2006,440(7084):637-643
    [56]Hart G T, Lee I, Marcotte E M. A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics, 2007,8:236
    [57]Tew K L, Li X L, Tan S H. Functional centrality:detecting lethality of proteins in protein interaction networks. Proceedings of the 18th International Conference on Genome Informatics,2007,19:166~177
    [58]Chua H N, Tew K L, Li X L, et al. A Unified Scoring Scheme for Detecting Essential Proteins in Protein Interaction Networks. Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence,2008,2:66~73
    [59]Hwang Y C, Lin C C, Chang J Y, et al. Predicting essential genes based on network and sequence analysis. Molecular bioSystems,2009,5(12):1672~1678
    [60]Acencio M L, Lemke N. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinformatics,2009,10:290
    [61]Fraser H B, Hirsh A E, Steinmetz L M, et al. Evolutionary Rate in the Protein Interaction Network. Science,2002,296(5568):750~752
    [62]Jordan I K, Rogozin I B, Wolf Y I, et al. Essential Genes Are More Evolutionarily Conserved Than Are Nonessential Genes in Bacteria. Genome Research,2002, 12(6):962~968
    [63]Pereira-Leal J B, Audit B, Peregrin-Alvarez J M, et al. An Exponential Core in the Heart of the Yeast Protein Interaction Network. Molecular Biology and Evolution,2005,22(3):421~425
    [64]Batada N N, Hurst L D, Tyers M. Evolutionary and Physiological Importance of Hub Proteins. PLoS Computational Biology,2006,2(7):e88
    [65]Xu J, Li Y. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics,2006,22(22):2800~2805
    [66]Park D, Park J, Park S G., et al. Analysis of human disease genes in the context of gene essentiality. Genomics,2008,92(6):414~418
    [67]Newman M E J. The Structure and Function of Complex Networks. SIAM Review,2003,45(2):167~256
    [68]Albert R, Barabasi A L. Statistical mechanics of complex networks. Reviews of Modern Physics,2002,74(1):47~97
    [69]Girvan M, Newman M E J. Community structure in social and biological networks. PNAS,2002,99(12):7821~7826
    [70]Xenarios I, Rice D W, Salwinski L, et al. DIP: the Database of Interacting Proteins. Nucleic Acids Research,2000,28(1):289~291
    [71]Cormen T H, Leiserson C E, Rivest R L, et al著.潘金贵,顾铁成,李成法等译.算法导论(原书第2版).北京:机械工业出版社,2006
    [72]Estrada E. Virtual identification of essential proteins within the protein interaction network of yeast. Proteomics,2006,6(1):35~40
    [73]Mason O, Verwoerd M. Graph theory and networks in Biology. IET Systems Biology,2007,1(2):89~119
    [74]Park K, Kim D. Localized network centrality and essentiality in the yeast-protein interaction network. Proteomics,2009,9(22):5143~5154
    [75]del Rio G., Koschutzki D, Coello G.. How to identify essential genes from molecular networks?. BMC Systems Biology,2009,3:102
    [76]Freeman L C. Centrality in Social Networks. Conceptual Clarification. Social Networks,1979,1(3):215~239
    [77]Freeman L C. A Set of Measures of Centrality Based on Betweenness. Sociometry, 1977,40(1):35-41
    [78]Wuchty S, Stadler P F. Centers of complex networks. Journal of Theoretical Biology,2003,223(1):45~53
    [79]Estrada E, Rodriguez-Velazquez J A. Subgraph centrality in complex networks. Physical Review E,2005,71(5):056103
    [80]Bonacich P. Power and Centrality: A Family of Measures. The American Journal of Sociology,1987,92(5):1170~1182
    [81]Stevenson K, Zelen M. Rethinking centrality:Methods and examples. Social Networks,1989,11(1):1-37
    [82]Newman M E J. A measure of betweenness centrality based on random walks. Social Networks,2005,27(1):39~54
    [83]Yu H, Kim P M, Sprecher E, et al. The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics. PLoS Computational Biology,2007,3(4):e59
    [84]李鹏翔,任玉晴,席酉民.网络节点(集)重要性的一种度量指标.系统工程,2004,22(4):13~20
    [85]许进.一种研究系统的新方法——核与核度法.系统工程与电子技术,1994,17(6):1~10
    [86]谭跃进,吴俊,邓宏钟.复杂网络中节点重要度评估的节点收缩方法.系统工程理论与实践,2006,26(11):79~83
    [87]von Mering C, Krause R, Snel B, et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature,2002,417(6887):399~403
    [88]Mewes H W, Frishman D, Mayer K F X, et al. MIPS: analysis and annotation of proteins from whole genomes in 2005. Nucleic Acids Research,2006, 34(Database issue):D169-D172
    [89]Cherry J M, Adler C, Ball C, et al. SGD: Saccharomyces Genome Database. Nucleic Acids Research,1998,26(1):73~79
    [90]Zhang R, Lin Y. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Research,2009,37(Database issue):D455-D458
    [91]http://www-sequence.stanford.edu/group/yeast_deletion_project (Saccharomyces Genome Deletion Project)
    [92]Radicchi F, Castellano C, Cecconi F, et al. Defining and identifying communities in networks. PNAS,2004,101(9):2658~2663
    [93]胡健,董跃华,杨炳儒.大型复杂网络中的社区结构发现算法.计算机工程,2008,34(19):92~93
    [94]Zhang P, Wang J, Li X, et al. Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications,2008, 387(27):6869~6875
    [95]Shlomi T, Segal D, Ruppin E, et al. QPath:a method for querying pathways in a protein-protein interaction network. BMC Bioinformatics,2006,7:199
    [96]Tan K, Shlomi T, Feizi H, et al. Transcriptional regulation of protein complexes within and across species. PNAS,2007,104(4):1283~1288
    [97]Shannon P, Markiel A, Ozier O, et al. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research, 2003,13(11):2498~2504
    [98]Enright A J, Van Dongen S, Ouzounis C A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research,2002,30(7):1575~1584
    [99]King A D, Przulj N, Jurisica I. Protein complex prediction via cost-based clustering. Bioinformatics,2004,20(17):3013~3020
    [100]Bader G D, Hogue C W. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics,2003,4:2
    [101]Guldener U, Munsterkotter M, Kastenmuller G, et al. CYGD:the Comprehensive Yeast Genome Database. Nucleic Acids Research,2005, 33(Database issue):D364-D368
    [102]Brohee S, van Helden J. Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics,2006,7:488
    [103]孙景春,徐晋麟,李亦学,等.大规模蛋白质相互作用数据的分析与应用.科学通报,2005,50(19):2055~2060
    [104]Saito R, Suzuki H, Hayashizaki Y. Interaction generality, a measurement to assess the reliability of a protein-protein interaction. Nucleic Acids Research, 2002,30(5):1163~1168
    [105]Deane C M, Salwinski L, Xenarios I, et al. Protein interactions: two methods for assessment of the reliability of high throughput observations. Molecular & Cellular Proteomics,2002,1(5):349~356
    [106]Bader J S, Chaudhuri A, Rothberg J M, et al. Gaining confidence in high-throughput protein interaction networks. Nature Biotechnology,2004,22(1): 78~85
    [107]Sharan R, Suthram S, Kelley R M, et al. Conserved patterns of protein interaction in multiple species. PNAS,2005,102(6):1974~1979
    [108]王巍,卢卫红,孙野青.基于基因本体论的模式生物分子功能分布异同.生物信息学,2010,8(3):228~232
    [109]Ashburner M, Ball C A, Blake J A, et al. Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics,2000, 25(1):25~29
    [110]Mahdavi M A, Lin Y H. False positive reduction in protein-protein interaction predictions using gene ontology annotations. BMC Bioinformatics, 2007,8:262
    [111]Wu H, Su Z, Mao F, et al. Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nuceic Acids Research,2005,33(9):2822~2837
    [112]Goldberg D S, Roth F P. Assessing experimentally derived interactions in a small world. PNAS,2003,100(8):4372~4376
    [113]Resnik P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. Proceedings of the 14th International Joint Conference on Artificial Intelligence,1995,448~453
    [114]Jiang J J, Conrath D W. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings of the 10th International Conference Research on Computational Linguistics,1997,19~33
    [115]Lin D. An Information-Theoretic Definition of Similarity. Proceedings of the 15th International Conference on Machine Learning,1998,296~304
    [116]Holman A G., Davis P G, Foster J M, et al. Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi. BMC Microbiology,2009,9:24

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700