用户名: 密码: 验证码:
A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome
详细信息    查看全文
  • 作者:Jia-Feng Yu (1) (2) (3)
    Jing Guo (4)
    Qing-Bin Liu (1) (5)
    Yue Hou (2)
    Ke Xiao (2)
    Qing-Li Chen (1) (5)
    Ji-Hua Wang (1) (3)
    Xiao Sun (2)

    1. Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics
    ; Institute of Biophysics ; Dezhou University ; Dezhou ; 253023 ; People鈥檚 Republic of China
    2. State Key Laboratory of Bioelectronics
    ; Southeast University ; Nanjing ; 210096 ; People鈥檚 Republic of China
    3. College of Physics and Electronic Information
    ; Dezhou University ; Dezhou ; 253023 ; People鈥檚 Republic of China
    4. School of Computer Engineering
    ; Nanyang Technological University ; Singapore ; 639798 ; Singapore
    5. College of Life Science
    ; Shandong Normal University ; Jinan ; 250014 ; People鈥檚 Republic of China
  • 关键词:Protein coding genes ; Re ; annotation ; Prokaryotic genome
  • 刊名:Genes & Genomics
  • 出版年:2015
  • 出版时间:April 2015
  • 年:2015
  • 卷:37
  • 期:4
  • 页码:347-355
  • 全文大小:516 KB
  • 参考文献:1. Aziz, RK, Bartels, D, Best, AA, DeJongh, M, Disz, T, Edwards, RA, Formsma, K, Gerdes, S, Glass, EM, Kubal, M (2008) The RAST server: rapid annotations using subsystems technology. BMC Genom 9: pp. 75 CrossRef
    2. Bakke, P, Carney, N, Deloache, W, Gearing, M, Ingvorsen, K, Lotz, M, McNair, J, Penumetcha, P, Simpson, S, Voss, L (2009) Evaluation of three automated genome annotations for halorhabdus utahensis. PLoS ONE 4: pp. e6291 CrossRef
    3. Besemer, J, Lomsadze, A, Borodovsky, M (2001) GeneMarks: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29: pp. 2607-2618 CrossRef
    4. Blattner, FR, Plunkett, G, Bloch, CA, Perna, NT, Burland, V, Riley, M, Collado-Vides, J, Glasner, JD, Rode, CK, Mayhew, GF (1997) The complete genome sequence of Escherichia coli K-12. Science 277: pp. 1453-1462 CrossRef
    5. Brenner, SE (1999) Errors in genome annotation. Trends Genet 15: pp. 132-133 CrossRef
    6. Burset, M, Guigo, R (1996) Evaluation of gene structure prediction programs. Genomics 34: pp. 353-367 CrossRef
    7. Chen, LL, Ma, BG, Gao, N (2008) Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043. FEBS J 275: pp. 198-206 CrossRef
    8. Chou, KC, Zhang, CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30: pp. 275-349 CrossRef
    9. Delcher, AL, Bratke, KA, Powers, EC, Salzberg, SL (2007) Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics 23: pp. 673-679 CrossRef
    10. Devos, D, Valencia, A (2001) Intrinsic errors in genome annotation. Trends Genet 17: pp. 429-431 CrossRef
    11. Gao, F, Zhang, CT (2004) Comparison of various algorithms for recognizing short coding sequences of human genes. Bioinformatics 20: pp. 673-681 CrossRef
    12. Gao, N, Chen, LL, Ji, HF, Wang, W, Chang, JW, Gao, B, Zhang, L, Zhang, SC, Zhang, HY (2010) DIGAP鈥攁 database of improved gene annotation for phytopathogens. BMC Genom 11: pp. 54 CrossRef
    13. Guo, FB, Xiong, L, Teng, JL, Yuen, KY, Lau, SK, Woo, PC (2013) Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity鈥攂ased and composition鈥攂ased methods. DNA Res 20: pp. 273-286 CrossRef
    14. Hyatt, D, Chen, GL, Locascio, PF, Land, ML, Larimer, FW, Hauser, LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11: pp. 19 CrossRef
    15. Kisand, V, Lettieri, T (2013) Genome sequencing of bacteria: sequencing, de novo assembly and rapid analysis using open source tools. BMC Genom 14: pp. 211 CrossRef
    16. Krause, L, McHardy, AC, Nattkemper, TW, P眉hler, A, Stoye, J, Meyer, F (2007) GISMO-gene identification using a support vector machine for ORF classification. Nucleic Acids Res 35: pp. 540-549 CrossRef
    17. Kyrpides, NC (2009) Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat Biotechnol 27: pp. 627-632 CrossRef
    18. Li, M, Wang, J, Chen, X, Wang, H, Pan, Y (2011) A local average connectivity-based method for identifying essential proteins from the network level. Comput Biol Chem 35: pp. 143-150 CrossRef
    19. Liao, B, Xiong, Q, Li, D (2012) Incorporating secondary features into the general form of Chou鈥檚 PseAAC for predicting protein structural class. Protein Peptide Lett 19: pp. 1133-1138 CrossRef
    20. Liolios, K, Chen, IM, Mavromatis, K, Tavernarakis, N, Hugenholtz, P, Markowitz, VM, Kyrpides, NC (2010) The genomes on line database (gold) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 38: pp. D346-D354 CrossRef
    21. Luo, CW, Hu, GQ, Zhu, HQ (2009) Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genom 10: pp. 552 CrossRef
    22. Meth茅, BA, Nelson, KE, Eisen, JA, Paulsen, IT, Nelson, W, Heidelberg, JF, Wu, D, Wu, M, Ward, N, Beanan, MJ (2003) Genome of Geobacter sulfurreducens: metal reduction in subsurface environments. Science 302: pp. 1967-1969 CrossRef
    23. Nagy, A, Hegyi, H, Farkas, K, Tordai, H, Kozma, E, B谩nyai, L, Patthy, L (2008) Identification and correction of abnormal, incomplete and mispredicted proteins in public databases. BMC Bioinform 9: pp. 353 CrossRef
    24. Pallej脿, A, Harrington, ED, Bork, P (2008) Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions?. BMC Genom 9: pp. 335 CrossRef
    25. Pati, A, Ivanova, NN, Mikhailova, N, Ovchinnikova, G, Hooper, SD, Lykidis, A, Kyrpides, NC (2010) GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 7: pp. 455-457 CrossRef
    26. Petty, NK (2010) Genome annotation: man versus machine. Nat Rev Microbiol 8: pp. 762 CrossRef
    27. Poptsova, MS, Gogarten, JP (2010) Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiol-SGM 156: pp. 1909-1917 CrossRef
    28. Pruitt, KD, Tatusova, T, Maglott, DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: pp. D61-D65 CrossRef
    29. Qiu, Y, Cho, BK, Park, YS, Lovley, D, Palsson, B脴, Zengler, K (2010) Structural and operational complexity of the Geobacter sulfurreducens genome. Genome Res 20: pp. 1304-1311 CrossRef
    30. Reed, JL, Famili, I, Thiele, I, Palsson, BO (2006) Towards multidimensional genome annotation. Nat Rev Genet 7: pp. 130-141 CrossRef
    31. Reeves, GA, Talavera, D, Thornton, JM (2009) Genome and proteome annotation: organization, interpretation and integration. J R Soc Interface 6: pp. 129-147 CrossRef
    32. Tatusov, RL, Natale, DA, Garkavtsev, IV, Tatusova, TA, Shankavaram, UT, Rao, BS, Kiryutin, B, Galperin, MY, Fedorova, ND, Koonin, EV (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29: pp. 22-28 CrossRef
    33. Ussery, DW, Hallin, PF (2004) Genome update: annotation quality in sequenced microbial genomes. Microbil-SGM 150: pp. 2015-2017 CrossRef
    34. Wang, Q, Lei, Y, Xu, X, Wang, G, Chen, LL (2013) Theoretical prediction and experimental verification of protein-coding genes in plant pathogen genome Agrobacterium tumefaciens strain C58. PLoS ONE 7: pp. e43176 CrossRef
    35. Warren, AS, Archuleta, J, Feng, WC, Setubal, JC (2010) Missing genes in the annotation of prokaryotic genomes. BMC Bioinform 11: pp. 131 CrossRef
    36. Yu, JF, Sun, X (2010) Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence. J Comput Chem 31: pp. 2126-2135 CrossRef
    37. Yu, JF, Sun, X, Wang, JH (2009) TN curve: a novel 3D graphical representation of DNA sequence based on trinucleotides and its applications. J Theor Biol 261: pp. 459-468 CrossRef
    38. Yu, JF, Xiao, K, Jiang, DK, Guo, J, Wang, JH, Sun, X (2011) An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 18: pp. 435-449 CrossRef
    39. Yu, JF, Jiang, DK, Xiao, K, Jin, Y, Wang, JH, Sun, X (2012) Discriminate the falsely predicted protein-coding genes in Aeropyrum Pernix K1 genome based on graphical representation. MATCH Commun Math Comput Chem 67: pp. 845-866
    40. Yu, JF, Guo, ZZ, Sun, X, Wang, JH (2014) A review of the computational methods for identifying the over-annotated genes and missing genes in microbial genomes. Current Bioinform 9: pp. 147-154 CrossRef
  • 刊物主题:Microbial Genetics and Genomics; Plant Genetics & Genomics; Animal Genetics and Genomics; Human Genetics;
  • 出版者:Springer Netherlands
  • ISSN:2092-9293
文摘
Protein coding gene annotation errors in prokaryotic genomes are accumulating continually in bioinformatics databases, while the update rate of genome annotation can not keep up with the explosive increasing genome sequences in most cases. Hence it is critical to manually rectify the genome annotation errors. In this paper, a hybrid strategy by combing the gene ab initio predicting programs and the over annotated gene re-annotation programs is proposed for re-annotation of the protein coding genes in prokaryotic genomes. Based on this strategy, the protein coding genes in Geobacter sulfurreducens PCA is comprehensively re-annotated. As a consequence, 16 hypothetical genes are annotated as non-coding sequences and 104 missing genes are retrieved as protein coding genes. Subsequent function analysis and sequences analysis show that the predicting results are much reliable and robust. Further application to other genomes show that this work can provide alternative tools for later post-process of prokaryotic genome annotations.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700