文摘
Fast similarity retrieval for high-dimensional unstructured data is becoming significantly important. In high-dimensional space, traditional tree-based index is incompetent comparing with hashing methods. As a state-of-the-art hashing approach, Spectral Hashing (SH) aims at designing compact binary codes for high-dimensional vectors so that the similarity structure of original vector space can be preserved in the code space. We propose a generic high-dimensional index framework named LuSH in this paper, which means Lucenebased SH. It uses SH as high-dimensional index and Lucene, the well-known open source inverted index, as underlying index file. To speedup retrieval efficiency, two improvement strategies are proposed. Experiments on large scale datasets containing up to 10 million data show significant performance of our LuSH framework.