用户名: 密码: 验证码:
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler
详细信息    查看全文
  • 作者:Ruibang Luo (1) (2)
    Binghang Liu (1) (2)
    Yinlong Xie (1) (2) (3)
    Zhenyu Li (1) (2)
    Weihua Huang (1)
    Jianying Yuan (1)
    Guangzhu He (1)
    Yanxiang Chen (1)
    Qi Pan (1)
    Yunjie Liu (1)
    Jingbo Tang (1)
    Gengxiong Wu (1)
    Hao Zhang (1)
    Yujian Shi (1)
    Yong Liu (1)
    Chang Yu (1)
    Bo Wang (1)
    Yao Lu (1)
    Changlei Han (1)
    David W Cheung (2)
    Siu-Ming Yiu (2)
    Shaoliang Peng (4)
    Zhu Xiaoqian (4)
    Guangming Liu (4)
    Xiangke Liao (4)
    Yingrui Li (1) (2)
    Huanming Yang (1)
    Jian Wang (1)
    Tak-Wah Lam (2)
    Jun Wang (1)
  • 关键词:Genome ; Assembly ; Contig ; Scaffold ; Error correction ; Gap ; filling
  • 刊名:GigaScience
  • 出版年:2012
  • 出版时间:December 2012
  • 年:2012
  • 卷:1
  • 期:1
  • 全文大小:260KB
  • 参考文献:1. Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HO, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, / et al.: Assemblathon 1: a competitive assessment of de novo short read assembly methods. / Genome Res 2011, 21:2224鈥?241. CrossRef
    2. Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Mar莽ais G, Pop M, Yorke JA: GAGE: a critical evaluation of genome assemblies and assembly algorithms. / Genome Res 2012, 22:557鈥?67. CrossRef
    3. Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. / Genome Res 2010, 20:265鈥?72. CrossRef
    4. Alkan C, Sajjadian S, Eichler EE: Limitations of next-generation genome sequence assembly. / Nat Methods 2011, 8:61鈥?5. CrossRef
    5. Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB: High-quality draft assemblies of mammalian genomes from massively parallel sequence data. / Proc Natl Acad Sci U S A 2011, 108:1513鈥?518. CrossRef
    6. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, / et al.: Genome sequence of YH: the first diploid genome sequence of a Han Chinese individual. [http://dx.doi.org/10.5524/100015] / GigaScience 2011.
    7. Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. / Genome Res 2008, 18:821鈥?29. CrossRef
    8. Ye C, Ma ZS, Cannon CH, Pop M, Yu DW: Exploiting sparseness in de novo genome assembly. / BMC Bioinformatics 2012, 13 Suppl 6:S1. CrossRef
    9. Peng Y, Leung HC, Yiu SM, Chin FY: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. / Bioinformatics 2012, 28:1420鈥?428. CrossRef
    10. Dayarian A, Michael TP, Sengupta AM: SOPRA: scaffolding algorithm for paired reads via statistical optimization. / BMC Bioinformatics 2010, 11:345. CrossRef
    11. [http://assemblathon.org] / The Assemblathon.
    12. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, / et al.: The diploid genome sequence of an Asian individual. / Nature 2008, 456:60鈥?5. CrossRef
    13. Wang J, Li Y, Luo R, Liu B, Xie Y, Li Z, Fang X, Zheng H, Qin J, Yang B, Yu C, Ni P, Li N, Guo G, Ye J, Fang L, Su Y, Asan , Zheng H, Kristiansen K, Wong GK, Nielsen R, Durbin R, Bolund L, Zhang X, Li S, Yang H, Wang J: Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). [http://dx.doi.org/10.5524/100038] / GigaScience Database 2012.
    14. [http://genome.ucsc.edu/] / The UCSC Genome Bioinformatics site.
    15. She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, Church DM, Sutton G, Halpern AL, Eichler EE: Shotgun sequence assembly and recent segmental duplications within the human genome. / Nature 2004, 431:927鈥?30. CrossRef
    16. [http://yh.genomics.org.cn] / Yan Huang - The first Asian diploid genome.
    17. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung D, Yiu SM, Liu G, Zhu X, Peng S, Li Y, Yang H, Wang J, Lam TW, Wang J: Software and supporting material for 鈥淪OAPdenovo2: an empirically improved memory-efficient short read de novo assembly鈥? [http://dx.doi.org/10.5524/100044] / GigaScience Database 2012.
  • 作者单位:Ruibang Luo (1) (2)
    Binghang Liu (1) (2)
    Yinlong Xie (1) (2) (3)
    Zhenyu Li (1) (2)
    Weihua Huang (1)
    Jianying Yuan (1)
    Guangzhu He (1)
    Yanxiang Chen (1)
    Qi Pan (1)
    Yunjie Liu (1)
    Jingbo Tang (1)
    Gengxiong Wu (1)
    Hao Zhang (1)
    Yujian Shi (1)
    Yong Liu (1)
    Chang Yu (1)
    Bo Wang (1)
    Yao Lu (1)
    Changlei Han (1)
    David W Cheung (2)
    Siu-Ming Yiu (2)
    Shaoliang Peng (4)
    Zhu Xiaoqian (4)
    Guangming Liu (4)
    Xiangke Liao (4)
    Yingrui Li (1) (2)
    Huanming Yang (1)
    Jian Wang (1)
    Tak-Wah Lam (2)
    Jun Wang (1)

    1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Kragujevac, Hong Kong
    2. HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
    3. School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, 510006, China
    4. School of Computer Science, National University of Defense Technology, No.47, Yanwachi street, Kaifu District, Changsha, Hunan, 410073, China
  • ISSN:2047-217X
文摘
Background There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. Findings To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Conclusions Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700