用户名: 密码: 验证码:
A space-efficient alphabet-independent Four-Russians' lookup table and a multithreaded Four-Russians' edit distance algorithm
详细信息    查看全文
文摘
Given two strings X   (|X|=m|X|=m) and Y   (|Y|=n|Y|=n) over an alphabet Σ, the edit distance between X and Y   can be computed in O(mn/t)O(mn/t) time with the help of the Four-Russians' lookup table whose block size is t  . The Four-Russians' lookup table can be constructed in O((3|Σ|)2tt2)O((3|Σ|)2tt2) time using O((3|Σ|)2tt)O((3|Σ|)2tt) space. However, the construction time and space requirement of the lookup table grow very fast as the alphabet size increases and thus it has been used only when |Σ||Σ| is very small. For example, when a string is a protein sequence, |Σ|=20|Σ|=20 and thus it is almost impossible to use the Four-Russians' lookup table on typical workstations. In this paper, we present an efficient alphabet-independent Four-Russians' lookup table. It requires O(32t(2t)!t)O(32t(2t)!t) space and can be constructed in O(32t(2t)!t2)O(32t(2t)!t2) time. Thus, the Four-Russians' lookup table can be constructed and used irrespective of the alphabet size. The time and space complexity were achieved by compacting the lookup table using a clever encoding of the preprocessed strings. Experimental results show that the space requirement of the lookup table is reduced to about 1/5,172,030 of its original size when |Σ|=26|Σ|=26 and t=4t=4. Furthermore, we present efficient multithreaded parallel algorithms for edit distance computation using the Four-Russians' lookup table. The parallel algorithm for lookup table construction runs in O(t)O(t) time and the parallel algorithm for edit distance computation between X and Y   runs in O(m+n)O(m+n) time. Experiments performed on CUDA-supported GPU show that our algorithm runs about 942 times faster than the sequential version of the original Four-Russians' algorithm for 100 pairs of random strings of length approximately 1,000 when |Σ|=4|Σ|=4 and t=4t=4.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700