用户名: 密码: 验证码:
MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement
详细信息    查看全文
  • 作者:Guanqun Shi (1)
    Liqing Zhang (2)
    Tao Jiang (1)
  • 刊名:BMC Bioinformatics
  • 出版年:2010
  • 出版时间:December 2010
  • 年:2010
  • 卷:11
  • 期:1
  • 全文大小:977KB
  • 参考文献:1. Fitch WM: Distinguishing homologous from analogous proteins. / Syst Zool 1970,19(2):99鈥?13. CrossRef
    2. Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. / Journal of Molecular Biology 2001,314(5):1041鈥?052. CrossRef
    3. Sankoff D: Genome rearrangement with gene families. / Bioinformatics 1999,15(11):909鈥?17. CrossRef
    4. Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T: MSOAR: a high-throughput ortholog assignment system based on genome rearrangement. / Journal of Computational Biology 2007,14(9):1160鈥?175. CrossRef
    5. Rasmussen MD, Kellis M: Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes. / Genome Research 2007,17(12):1932鈥?942. CrossRef
    6. Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. / PNAS 2005,102(6):1974鈥?979. CrossRef
    7. Bandyopadhyay S, Sharan R, Ideker T: Systematic identification of functional orthologs based on protein network comparison. / Genome Research 2006,16(3):428鈥?35. CrossRef
    8. Wu F, Mueller LA, Crouzillat D, Petiard V, Tanksley SD: Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: a test case in the euasterid plant clade. / Genetics 2006,174(3):1407鈥?420. CrossRef
    9. Mao F, Su Z, Olman V, Dam P, Liu Z, Xu Y: Mapping of orthologous genes in the context of biological pathways: an application of integer programming. / PNAS 2006, 103:129鈥?34. CrossRef
    10. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. / Nucleic Acids Research 2001, 29:22鈥?8. CrossRef
    11. Berglund AC, Sj枚lund E, Ostlund G, Sonnhammer EL: InParanoid 6: eukaryotic ortholog clusters with inparalogs. / Nucleic Acids Research 2008., (36 Database):
    12. Li L, Stoeckert CJ, Roos DS: OrthoMCL: identification of ortholog groups for eukaryotic genomes. / Genome Research 2003,13(9):2178鈥?189. CrossRef
    13. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. / Nucleic Acids Research 2007., (35 Database):
    14. Li H, Coghlan A, Ruan J, Coin LJ, Heriche JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J, Durbin R: TreeFam: a curated database of phylogenetic trees of animal gene families. / Nucleic Acids Research 2006., 34:
    15. Goodstadt L, Ponting CP: Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. / PLoS Comput Biol 2006,2(9):e133. CrossRef
    16. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. / Genome Research 2009,19(2):327鈥?35. CrossRef
    17. Kuzniar A, Vanham R, Pongor S, Leunissen J: The quest for orthologs: finding the corresponding gene across genomes. / Trends in Genetics 2008,24(11):539鈥?51. CrossRef
    18. Hannenhalli S, Pevzner P: Transforming men into mice (polynomial algorithm for genomic distance problem). / Foundations of Computer Science, Annual IEEE Symposium 1995, 0:581.
    19. Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D: Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. / PNAS 2003,100(20):11484鈥?1489. CrossRef
    20. Pevzner P, Tesler G: Genome rearrangements in aammalian evolution: lessons from human and mouse genomes. / Genome Research 2003, 13:37鈥?5. CrossRef
    21. Semon M, Wolfe KH: Rearrangement rate following the whole-genome duplication in teleosts. / Molecular Biology and Evolution 2007,24(3):860鈥?67. CrossRef
    22. Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T: Assignment of orthologous genes via genome rearrangement. / IEEE/ACM Trans Comput Biol Bioinformatics 2005,2(4):302鈥?15. CrossRef
    23. Shoja V, Zhang L: A roadmap of tandemly arrayed genes in the genomes of human, mouse, and rat. / Molecular Biology and Evolution 2006,23(11):2134鈥?141. CrossRef
    24. Pan D, Zhang L: Tandemly arrayed genes in vertebrate genomes. / Comparative and Functional Genomics 2008.,2008(545269):
    25. Ohno S: / Evolution by gene duplication. 1970.
    26. Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Peer Y: Modeling gene and genome duplications in eukaryotes. / PNAS 2005,102(15):5454鈥?459. CrossRef
    27. Zhang J: Evolution by gene duplication: an update. / Trends in Ecology & Evolution 2003,18(6):292鈥?98. CrossRef
    28. Hurles M: Gene duplication: the genomic trade in spare parts. / PLoS Biol 2004,2(7):e206+. CrossRef
    29. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W: Aligning multiple genomic sequences with the threaded blockset aligner. / Genome Research 2004,14(4):708鈥?15. CrossRef
    30. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, Haugen E, Zerr T, Yamada NA, Tsang P, Newman TL, Tuzun E, Cheng Z, Ebling HM, Tusneem N, David R, Gillett W, Phelps KA, Weaver M, Saranga D, Brand A, Tao W, Gustafson E, McKernan K, Chen L, Malig M, Smith JD, Korn JM, McCarroll SA, Altshuler DA, Peiffer DA, Dorschner M, Stamatoyannopoulos J, Schwartz D, Nickerson DA, Mullikin JC, Wilson RK, Bruhn L, Olson MV, Kaul R, Smith DR, Eichler EE: Mapping and sequencing of structural variation from eight human genomes. / Nature 2008, (7191):56鈥?4.
    31. Wain H: Guidelines for Human Gene Nomenclature. / Genomics 2002,79(4):464鈥?70. CrossRef
    32. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. / Journal of Molecular Biology 1990,215(3):403鈥?10.
    33. Enright AJ, Van Dongen S, Ouzounis CA: An efficient algorithm for large-scale detection of protein families. / Nucleic Acids Research 2002,30(7):1575鈥?584. CrossRef
    34. Alexeyenko A, Lindberg J, P茅rez-Bercoff r, Sonnhammer ELL: Overview and comparison of ortholog databases. / Drug Discovery Today: Technologies 2006,3(2):137鈥?43. CrossRef
    35. Katoh K, Misawa K, Kuma Ki, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. / Nucleic Acids Research 2002,30(14):3059鈥?066. CrossRef
    36. Katoh K, Kuma Ki, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. / Nucleic Acids Research 2005,33(2):511鈥?18. CrossRef
    37. Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. / Journal of Molecular Biology 2000, 302:205鈥?17. CrossRef
    38. Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. / Nucleic Acids Research 2003,31(13):3497鈥?500. CrossRef
    39. Felsenstein J: / PHYLIP (phylogeny inference package), version 3.57 c. Seattle: University of Washington; 1995.
    40. Kishino H, Hasegawa M: Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. / Journal of Molecular Evolution 1989,29(2):170鈥?. CrossRef
    41. Felsenstein J, Churchill GA: A Hidden Markov Model approach to variation among sites in rate of evolution. / Molecular Biology and Evolution 1996, 13:93鈥?04.
    42. Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. / Nucleic Acids Research 2006,34(suppl-2):W609鈥?12. CrossRef
    43. Gascuel O: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. / Molecular Biology and Evolution 1997,14(7):685鈥?95.
    44. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. / Systematic Biology 2003,52(5):696鈥?04. CrossRef
    45. Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. / Bioinformatics 2001,17(8):754鈥?55. CrossRef
    46. Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. / Comput Appl Biosci 1997,13(5):555鈥?56.
    47. Hess PN, Russo MDE, Claudia A: An empirical test of the midpoint rooting method. / Biological Journal of the Linnean Society 2007,92(4):669鈥?74. CrossRef
    48. Chauve C, Doyon JP, El-Mabrouk N: Gene family evolution by duplication, speciation, and loss. / Journal of Computational Biology 2008,15(8):1043鈥?062. CrossRef
    49. Friedman R, Hughes AL: The temporal distribution of gene duplication events in a set of highly conserved human gene families. / Molecular Biology and Evolution 2003, 20:154鈥?61. CrossRef
  • 作者单位:Guanqun Shi (1)
    Liqing Zhang (2)
    Tao Jiang (1)

    1. Department of Computer Science, University of California, Riverside, CA, 92521, USA
    2. Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
  • ISSN:1471-2105
文摘
Background Ortholog assignment is a critical and fundamental problem in comparative genomics, since orthologs are considered to be functional counterparts in different species and can be used to infer molecular functions of one species from those of other species. MSOAR is a recently developed high-throughput system for assigning one-to-one orthologs between closely related species on a genome scale. It attempts to reconstruct the evolutionary history of input genomes in terms of genome rearrangement and gene duplication events. It assumes that a gene duplication event inserts a duplicated gene into the genome of interest at a random location (i.e., the random duplication model). However, in practice, biologists believe that genes are often duplicated by tandem duplications, where a duplicated gene is located next to the original copy (i.e., the tandem duplication model). Results In this paper, we develop MSOAR 2.0, an improved system for one-to-one ortholog assignment. For a pair of input genomes, the system first focuses on the tandemly duplicated genes of each genome and tries to identify among them those that were duplicated after the speciation (i.e., the so-called inparalogs), using a simple phylogenetic tree reconciliation method. For each such set of tandemly duplicated inparalogs, all but one gene will be deleted from the concerned genome (because they cannot possibly appear in any one-to-one ortholog pairs), and MSOAR is invoked. Using both simulated and real data experiments, we show that MSOAR 2.0 is able to achieve a better sensitivity and specificity than MSOAR. In comparison with the well-known genome-scale ortholog assignment tool InParanoid, Ensembl ortholog database, and the orthology information extracted from the well-known whole-genome multiple alignment program MultiZ, MSOAR 2.0 shows the highest sensitivity. Although the specificity of MSOAR 2.0 is slightly worse than that of InParanoid in the real data experiments, it is actually better than that of InParanoid in the simulation tests. Conclusions Our preliminary experimental results demonstrate that MSOAR 2.0 is a highly accurate tool for one-to-one ortholog assignment between closely related genomes. The software is available to the public for free and included as online supplementary material.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700