用户名: 密码: 验证码:
Linear-Time Sequence Comparison Using Minimal Absent Words & Applications
详细信息    查看全文
  • 关键词:Algorithms on strings ; Sequence comparison ; Alignment ; free comparison ; Absent words ; Forbidden words ; Circular words
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9644
  • 期:1
  • 页码:334-346
  • 全文大小:236 KB
  • 参考文献:1.Acquisti, C., Poste, G., Curtiss, D., Kumar, S.: Nullomers: really a matter of natural selection? PLoS ONE 2(10), e1022 (2007)CrossRef
    2.Barton, C., Heliou, A., Mouchard, L., Pissis, S.P.: Linear-time computation of minimal absent words using suffix array. BMC Bioinform. 15, 388 (2014)CrossRef
    3.Barton, C., Heliou, A., Mouchard, L., Pissis, S.P.: Parallelising the computation of minimal absent words. In: PPAM, LNCS. Springer, Heidelberg (2015)
    4.Barton, C., Iliopoulos, C.S., Kundu, R., Pissis, S.P., Retha, A., Vayani, F.: Accurate and efficient methods to improve multiple circular sequence alignment. In: Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 247–258. Springer, Heidelberg (2015)CrossRef
    5.Béal, M., Mignosi, F., Restivo, A., Sciortino, M.: Forbidden words in symbolic dynamics. Adv. Appl. Math. 25(2), 163–193 (2000)MathSciNet CrossRef MATH
    6.Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional burrows-wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013)CrossRef
    7.Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theor. Comput. Sci. 450, 109–116 (2012)MathSciNet CrossRef MATH
    8.Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, New York, NY, USA (2007)CrossRef MATH
    9.Crochemore, M., Mignosi, F., Restivo, A.: Automata and forbidden words. Inf. Process. Lett. 67, 111–117 (1998)MathSciNet CrossRef MATH
    10.Domazet-Lošo, M., Haubold, B.: Efficient estimation of pairwise distances between genomes. Bioinformatics 25(24), 3221–3227 (2009)CrossRef
    11.Fici, G.: Minimal Forbidden Words and Applications. Ph.D. thesis, Université de Marne-la-Vallée (2006)
    12.Fischer, J.: Inducing the LCP-array. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844, pp. 374–385. Springer, Heidelberg (2011)CrossRef
    13.Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)MathSciNet CrossRef MATH
    14.Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)CrossRef
    15.Fukae, H., Ota, T., Morita, H.: On fast and memory-efficient construction of an antidictionary array. In: ISIT, pp. 1092–1096. IEEE (2012)
    16.Garcia, S.P., Pinho, A.J., Rodrigues, J.M.O.S., Bastos, C.A.C., Ferreira, P.J.S.G.: Minimal absent words in prokaryotic and eukaryotic genomes. PLoS ONE 6(1), e16065 (2011)CrossRef
    17.Goios, A., Pereira, L., Bogue, M., Macaulay, V., Amorim, A.: mtDNA phylogeny and evolution of laboratory mouse strains. Genome Res. 17(3), 293–298 (2007)CrossRef
    18.Grossi, R., Iliopoulos, C.S., Mercaş, R., Pisanti, N., Pissis, S.P., Retha, A., Vayani, F.: Circular sequence comparison with q-grams. In: Pop, M., Touzet, H. (eds.) WABI 2015. LNCS, vol. 9289, pp. 203–216. Springer, Heidelberg (2015)CrossRef
    19.Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Discrete Algorithms 8(4), 418–428 (2010)MathSciNet CrossRef MATH
    20.Landau, G.M., Myers, E.W., Schmidt, J.P.: Incremental string comparison. SIAM J. Comput. 27(2), 557–582 (1998)MathSciNet CrossRef MATH
    21.Maes, M.: On a cyclic string-to-string correction problem. Inf. Process. Lett. 35(2), 73–78 (1990)MathSciNet CrossRef MATH
    22.Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)MathSciNet CrossRef MATH
    23.Mignosi, F., Restivo, A., Sciortino, M.: Words and forbidden factors. Theor. Comput. Sci. 273(1–2), 99–117 (2002)MathSciNet CrossRef MATH
    24.Mosig, A., Hofacker, I.L., Stadler, P.F.: Comparative analysis of cyclic sequences: viroids and other small circular RNAs. GCB, LNI 83, 93–102 (2006)
    25.Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC, pp. 193–202. IEEE (2009)
    26.Ota, T., Morita, H.: On a universal antidictionary coding for stationary ergodic sources with finite alphabet. In: ISITA, pp. 294–298. IEEE (2014)
    27.Ota, T., Morita, H.: On antidictionary coding based on compacted substring automaton. In: ISIT, pp. 1754–1758. IEEE (2013)
    28.Pinho, A.J., Ferreira, P.J.S.G., Garcia, S.P., Rodrigues, J.M.O.S.: On finding minimal absent words. BMC Bioinform. 10(1), 1 (2009)CrossRef
    29.Robinson, D., Fould, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)MathSciNet CrossRef MATH
    30.Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)
    31.Silva, R.M., Pratas, D., Castro, L., Pinho, A.J., Ferreira, P.J.S.G.: Three minimal sequences found in Ebola virus genomes and absent from human DNA. Bioinformatics 31(15), 2421–2425 (2015)CrossRef
    32.Ukkonen, E.: Approximate string-matching with \(q\) -grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)MathSciNet CrossRef MATH
    33.Wheeler, T.J.: Large-scale neighbor-joining with NINJA. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 375–389. Springer, Heidelberg (2009)CrossRef
  • 作者单位:Maxime Crochemore (15)
    Gabriele Fici (16)
    Robert Mercaş (15) (17)
    Solon P. Pissis (15)

    15. Department of Informatics, King’s College London, London, UK
    16. Dipartimento di Matematica e Informatica, Università di Palermo, Palermo, Italy
    17. Department of Computer Science, Kiel University, Kiel, Germany
  • 丛书名:LATIN 2016: Theoretical Informatics
  • ISBN:978-3-662-49529-2
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in the sequence. An absent word is minimal if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by considering all their minimal absent words. In the process, we present results of combinatorial interest, and also extend the proposed techniques to compare circular sequences.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700