用户名: 密码: 验证码:
LuSH: A Generic High-Dimensional Index Framework
详细信息    查看全文
  • 作者:Zhou Yu (1) yuz@zju.edu.cn
    Jian Shao (1) jshao@zju.edu.cn
    Fei Wu (1) wufei@zju.edu.cn
  • 关键词:High ; dimensional Index – ; Spectral Hashing – ; Lucene
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2012
  • 出版时间:2012
  • 年:2012
  • 卷:7419
  • 期:1
  • 页码:181-191
  • 全文大小:434.7 KB
  • 参考文献:1. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. IEEE Transactions on Systems, Man, and Cybernetics 8(6), 460–472 (1978)
    2. Chatzichristofis, S.A., Boutalis, Y.S.: CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval. In: Gasteratos, A., Vincze, M., Tsotsos, J.K. (eds.) ICVS 2008. LNCS, vol. 5008, pp. 312–322. Springer, Heidelberg (2008)
    3. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60(2), 91–110 (2004)
    4. Salakhutdinov, R., Hinton, G.: Semantic hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)
    5. Apache Lucene Project, http://lucene.apache.org/
    6. Guattman, A.: R-Tree: A Dynamic Index Structure for Spatial Searching. In: ACM SIGMOD Int. Conf. on Management of Data, Boston, pp. 47–57 (1984)
    7. Bentley, J.L.: Multidimensional binary search Trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)
    8. Bellman, R.: Adaptive control processes: a guided tour. Princeton University Press (1961)
    9. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pp. 604–613. ACM, New York (1998)
    10. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS, pp. 459–468 (2006)
    11. Salakhutdinov, R., Hinton, G.: Learning a nonlinear embedding by preserving class neighbourhood structure. In: AI and Statistics, p. 5 (2007)
    12. Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter sensitive hashing. In: ICCV, pp. 750–757 (2003)
    13. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances in Neural Information Processing Systems, pp. 1753–1760 (2009)
    14. Bengio, Y., Delalleau, O., Roux, N., et al.: Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation 16(10), 2197–2219 (2004)
    15. Coifman, R., Lafon, S., Lee, A., et al.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. of the National Academy of Sciences of the United States of America 102(21), 7426 (2005)
    16. Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. Journal of Computer and System Sciences 74(8), 1289–1308 (2008)
    17. Comer, D.: Ubiquitous B-tree. ACM Computing Surveys (CSUR) 11(2), 121–137 (1979)
    18. SogouP2.0, http://www.sogou.com/labs/dl/p2.html
    19. Ng, A.Y., Jordan, M.I., Weiss, Y.: On Spectral Clustering: Analysis and an algorithm. Journal of Advances in Neural Information Processing Systems 2, 849–856 (2002)
  • 作者单位:1. College of Computer Science, Zhejiang University, Hangzhou, China 310027
  • ISSN:1611-3349
文摘
Fast similarity retrieval for high-dimensional unstructured data is becoming significantly important. In high-dimensional space, traditional tree-based index is incompetent comparing with hashing methods. As a state-of-the-art hashing approach, Spectral Hashing (SH) aims at designing compact binary codes for high-dimensional vectors so that the similarity structure of original vector space can be preserved in the code space. We propose a generic high-dimensional index framework named LuSH in this paper, which means Lucenebased SH. It uses SH as high-dimensional index and Lucene, the well-known open source inverted index, as underlying index file. To speedup retrieval efficiency, two improvement strategies are proposed. Experiments on large scale datasets containing up to 10 million data show significant performance of our LuSH framework.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700