用户名: 密码: 验证码:
Geometric BWT: Compressed Text Indexing via Sparse Suffixes and Range Searching
详细信息    查看全文
  • 作者:Yu-Feng Chien (1)
    Wing-Kai Hon (1)
    Rahul Shah (2)
    Sharma V. Thankachan (2)
    Jeffrey Scott Vitter (3)

    1. Department of CS
    ; National Tsing Hua University ; Hsinchu ; Taiwan
    2. Department of CS
    ; Louisiana State University ; Baton Rouge ; LA ; USA
    3. Department of EECS
    ; The University of Kansas ; Lawrence ; KS ; USA
  • 关键词:Text indexing ; Entropy compression ; Geometric range searching
  • 刊名:Algorithmica
  • 出版年:2015
  • 出版时间:February 2015
  • 年:2015
  • 卷:71
  • 期:2
  • 页码:258-278
  • 全文大小:682 KB
  • 参考文献:1. Agarwal, P.K., Erickson, J. (1999) Geometric range searching and its relatives. Adv. Discret. Comput. Geom. 23: pp. 1-56 CrossRef
    2. Aggarwal, A., Vitter, J.S. (1998) The input/output complexity of sorting and related problems. Commun. ACM 31: pp. 1116-1127 CrossRef
    3. Aref, W.G., Ilyas, I.F. (2001) SP-GiST: an extensible database index for supporting space partitioning trees. J. Intell. Inf. Syst. 17: pp. 215-240 CrossRef
    4. Arge, L., Brodal, G.S., Fagerberg, R., Laustsen, M. (2005) Cache-oblivious planar orthogonal range searching and counting. Proceedings of Symposium on Computational Geometry. pp. 160-169
    5. Arge, L., Samoladas, V., Vitter, J.S. (1999) Two-dimensional indexability and optimal range search indexing. Proceedings of Symposium on Principles of Database Systems. pp. 346-357
    6. Arroyuelo, D., Navarro, G. (2007) A Lempel-Ziv text index on secondary storage. Proceedings of Symposium on Combinatorial Pattern Matching. pp. 83-94 CrossRef
    7. Baeza-Yates, R., Barbosa, E.F., Ziviani, N. (1996) Hierarchies of indices for text searching. Inf. Syst. 21: pp. 497-514 CrossRef
    8. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation, Paolo Alto CA, USA (1994)
    9. Chazelle, B. (1990) Lower bounds for orthogonal range searching. I: The reporting case. J. ACM 37: pp. 200-212 CrossRef
    10. Clark, D., Munro, I. (1996) Efficient suffix trees on secondary storage. Proceedings of Symposium on Discrete Algorithms. pp. 383-391
    11. Chien, Y.F., Hon, W.K., Shah, R., Vitter, J.S. (2008) Geometric Burrows-Wheeler transform: linking range searching and text indexing. Proceedings of Data Compression Conference. pp. 252-261
    12. Chiu, S.Y., Hon, W.K., Shah, R., Vitter, J.S. (2010) I/O-efficient compressed text indexes: from theory to practice. Proceedings of Data Compression Conference. pp. 426-434
    13. Ferragina, P., Grossi, R. (1999) The string B-tree: a new data structure for string searching in external memory and its application. J. ACM 46: pp. 236-280 CrossRef
    14. Ferragina, P., Manzini, G. (2005) Indexing compressed text. J. ACM 52: pp. 552-581 CrossRef
    15. Ferragina, P., Venturini, R. (2007) A simple storage scheme for strings achieving entropy bounds. Proceedings of Symposium on Discrete Algorithms. pp. 690-696
    16. Fischer, J., Gagie, T., Kopelowitz, T., Lewenstein, M., M盲kinen, V., Salmela, L., V盲lim盲ki, N.N. (2012) Forbidden patterns. Proceedings of Latin American Theoretical Informatics. pp. 327-337
    17. Gagie, T., Gawrychowski, P.: Linear-space substring range counting over polylogarithmic alphabets. (2012). CoRR. arXiv:1202.3208 [cs.DS]
    18. Gonz谩lez, R., Navarro, G. (2007) A compressed text index on secondary memory. Proceedings of International Workshop on Combinatorial Algorithms. pp. 80-91
    19. Grossi, R., Gupta, A., Vitter, J.S. (2003) High-order entropy-compressed text indexes. Proceedings of Symposium on Discrete Algorithms. pp. 841-850
    20. Grossi, R., Vitter, J.S. (2005) Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35: pp. 378-407 CrossRef
    21. Guttman, A. (1984) R-trees: a dynamic index structure for spatial searching. Proceedings of International Conference on Management of Data. pp. 47-57
    22. Hellerstein, J.M., Naughton, J.F., Pfeffer, A. (1995) Generalized search trees for database systems. Proceedings of International Conference on Very Large Data Bases. pp. 562-573
    23. Hon, W.K., Lam, T.W., Shah, R., Lung, S.L., Vitter, J.S. (2009) Succinct index for dynamic dictionary matching. Proceedings of Symposium on Algorithms and Computation. pp. 1034-1043 CrossRef
    24. Hon, W.K., Lam, T.W., Shah, R., Lung, S.L., Vitter, J.S. (2008) Compressed index for dictionary matching. Proceedings of Data Compression Conference. pp. 23-32
    25. Hon, W.K., Shah, R., Vitter, J.S.: Ordered pattern matching: towards full-text retrieval. Technical report TR-06-008, Purdue University (2006)
    26. Hon, W.K., Shah, R., Thankachan, S.V., Vitter, J.S. (2009) On entropy-compressed text indexing in external memory. Proceedings of International Symposium on String Processing and Information Retrieval. pp. 75-89 CrossRef
    27. Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S. (2011) Compressed text indexing with wildcards. Proceedings of International Symposium on String Processing and Information Retrieval. pp. 267-277 CrossRef
    28. Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S. (2011) Compressed dictionary matching with one errors. Proceedings of Data Compression Conference. pp. 113-122
    29. Hon, W.K., Shah, R., Vitter, J.S. (2010) Compression, indexing, and retrieval for massive string data. Proceedings of Symposium on Combinatorial Pattern Matching. pp. 260-274 CrossRef
    30. Jacobson, G. (1989) Space-efficient static trees and graphs. Proceedings of Symposium on Foundations of Computer Science. pp. 549-554 CrossRef
    31. Kanth, K.V.R., Singh, A.K. (1999) Optimal dynamic range searching in non-replicating index structures. Proceedings of International Conference on Database Theory. pp. 257-276
    32. K盲rkk盲inen, J., Ukkonen, E. (1996) Sparse suffix trees. Proceedings of International Conference on Computing and Combinatorics. pp. 219-230 CrossRef
    33. Kolpakov, R., Kucherov, G., Starikovskaya, T.A. (2011) Pattern matching on sparse suffix trees. International Conference on Data Compression, Communications and Processing.
    34. M盲kinen, V., Navarro, G.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)
    35. M盲kinen, V., Navarro, G.: Dynamic entropy-compressed sequences and full-text indexes. Technical report TR/DCC-2006-10, University of Chile (2006)
    36. M盲kinen, V., Navarro, G. (2006) Position-restricted substring searching. Proceedings of Latin American Theoretical Informatics Symposium. pp. 703-714
    37. M盲kinen, V., Navarro, G., Sadakane, K. (2004) Advantages of backward searching-efficient secondary memory and distributed implementation of compressed suffix arrays. Proceedings of Symposium on Algorithms and Computation. pp. 681-692 CrossRef
    38. Manber, U., Myers, G. (1993) Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22: pp. 935-948 CrossRef
    39. McCreight, E.M. (1976) A space-economical suffix tree construction algorithm. J. ACM 23: pp. 262-272 CrossRef
    40. Munro, J.I. (1996) Tables. Proceedings of Conference on Foundations of Software Technology and Theoretical Computer Science. pp. 37-42 CrossRef
    41. Russo, L.M.S., Navarro, G., Oliveira, A.L. (2011) Fully compressed suffix trees. ACM Trans. Algorithms 7: pp. 53 CrossRef
    42. Sadakane, K.: Compressed suffix trees with full functionality. Theory Comput. Syst. 589鈥?07(2007)
    43. Samet, H. (1984) The quadtree and related hierarchical data structures. ACM Comput. Surv. 16: pp. 187-260 CrossRef
    44. Subramanian, S., Ramaswamy, S. (1995) The P-range tree: a new data structure for range searching in secondary memory. Proceedings of Symposium on Discrete Algorithms. pp. 378-387
    45. Thankachan, S.V. (2011) Compressed indexes for aligned pattern matching. Proceedings of International Symposium on String Processing and Information Retrieval. pp. 410-419 CrossRef
    46. Weiner, P. (1973) Linear pattern matching algorithms. Proceedings of Symposium on Switching and Automata Theory. pp. 1-11 CrossRef
    47. Willard, D.E. (1983) Log-logarithmic worst-case range queries are possible in space 胃(N). Inf. Process. Lett. 17: pp. 81-84 CrossRef
    48. Yu, C.C., Hon, W.K., Wang, B.F. (2009) Efficient data structures for orthogonal range successor problem. Proceedings of International Computing and Combinatorics Conference. pp. 96-105 CrossRef
  • 刊物类别:Computer Science
  • 刊物主题:Algorithm Analysis and Problem Complexity
    Theory of Computation
    Mathematics of Computing
    Algorithms
    Computer Systems Organization and Communication Networks
    Data Structures, Cryptology and Information Theory
  • 出版者:Springer New York
  • ISSN:1432-0541
文摘
We introduce a new variant of the popular Burrows-Wheeler transform (BWT), called Geometric Burrows-Wheeler Transform (GBWT), which converts a text into a set of points in 2-dimensional geometry. We also introduce a reverse transform, called Points2Text, which converts a set of points into text. Using these two transforms, we show strong equivalence between data structural problems in geometric range searching and text pattern matching. This allows us to apply the lower bounds known in the field of orthogonal range searching to the problems in compressed text indexing. In addition, we give the first succinct (compact) index for I/O-efficient pattern matching in external memory, and show how this index can be further improved to achieve higher-order entropy compressed space.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700