摘要
目前互联网中以文本存在的数据非常庞大,针对在如此庞大的文本中如何准确、快速地找到多个不同的目标字符串的问题,在介绍常见的模式匹配算法的优点和缺点基础上,结合Trie速多模式匹配算法。根据对比性实验的结果分析得出,改进AC且匹配速度大约是AC算法的5倍,说明改进AC
There exists a large amount of text data on the Internet currently. In allusion to the problem that how to search out multiple different target character strings accurately and quickly in such large text,an improved fast multi-pattern matching algorithm is proposed on the basis of introducing the advantages and disadvantages of common pattern matching algorithms,and combining with the idea of converting the Trie tree into the double array form. A comparison experiment was carried out. The analysis results show that the improved AC algorithm can successfully match all the to-be queried pattern strings in the text,and its matching speed is about 5 times of that of the AC algorithm,which shows that the improved AC algorithm has good effects in aspects of matching speed,recall ratio and space utilization rate.
引文
[1]蒋莉莉.字符串模式匹配算法的改进研究[J].电脑知识与技术,2008(3):526-528.JIANG Lili.The research of improved matching algorithm of string pattern[J].Computer knowledge and technology,2008(3):526-528.
[2]张利香.基于后缀数组的字符串模式查找的算法[D].兰州:西北师范大学,2010.ZHANG Lixiang.The string pattern searching algorithms based on suffix arrays[D].Lanzhou:Northwest Normal University,2010.
[3]杨波.基于有限状态自动机的中文多模式匹配算法研究[D].合肥:合肥工业大学,2013.YANG Bo.Research on multi-pattern matching algorithm for Chinese based on finite state automata[D].Hefei:Hefei University of Technology,2013.
[4]蔡恒,张帅.基于BF算法改进的字符串模式匹配算法[J].电脑编程技巧与维护,2014(22):14-15.CAI Heng,ZHANG Shuai.An improved string pattern matching algorithm based on BF algorithm[J].Computer programming skills&maintenance,2014(22):14-15.
[5]李明月,张善卿,陆剑锋,等.一种改进的Sunday匹配算法[J].杭州电子科技大学学报(自然科学版),2015,35(1):93-96.LI Mingyue,ZHANG Shanqing,LU Jianfeng,et al.A modified Sunday matching algorithm[J].Journal of Hangzhou Dianzi University(Natural Sciences),2015,35(1):93-96.
[6]刘丽霞,张志强.基于Trie树的相似字符串查找算法[J].计算机应用,2013,33(8):2375-2378.LIU Lixia,ZHANG Zhiqiang.Similar string search algorithm based on Trie tree[J].Journal of computer applications,2013,33(8):2375-2378.
[7]YANG Wenchuan,FANG Zeyang,LI Pengfei.Study for the double-array Trie tree based algorithm in word segmentation[C]//Proceedings of 2015 International Conference on Computer Science and Environmental Engineering.Beijing:DEStech Publications Inc.,2015:7.
[8]徐聪,张丰,杜震洪,等.基于哈希和双数组Trie树的多层次地址匹配算法[J].浙江大学学报(理学版),2014,41(2):217-222.XU Cong,ZHANG Feng,DU Zhenhong,et al.A multi-level address-matching algorithm based on Hash function and doublearray Trie-tree[J].Journal of Zhejiang University(Science edition),2014,41(2):217-222.
[9]KAROONBOONYANAN T.An implementation of double-array Trie[EB/OL].[2018-11-21].https://linux.thai.net/~thep/datrie/datrie.html.