用户名: 密码: 验证码:
一种基于Aho-Corasick算法改进的多模式匹配算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:An improved multi-pattern matching algorithm based on Aho-Corasick algorithm
  • 作者:陈永杰 ; 吾守尔·斯拉木 ; 于清
  • 英文作者:CHEN Yongjie;Wushour Silamu;YU Qing;School of Information Science and Engineering,Xinjiang University;
  • 关键词:字符串匹配 ; 多模式匹配 ; Trie树 ; 双数组 ; AC算法 ; 匹配速度
  • 英文关键词:character string matching;;multi-pattern matching;;Trie tree;;double array;;AC algorithm;;matching speed
  • 中文刊名:XDDJ
  • 英文刊名:Modern Electronics Technique
  • 机构:新疆大学信息科学与工程学院;
  • 出版日期:2019-02-15
  • 出版单位:现代电子技术
  • 年:2019
  • 期:v.42;No.531
  • 基金:国家“973”重点基础研究计划(2014CB340506)~~
  • 语种:中文;
  • 页:XDDJ201904022
  • 页数:5
  • CN:04
  • ISSN:61-1224/TN
  • 分类号:97-101
摘要
目前互联网中以文本存在的数据非常庞大,针对在如此庞大的文本中如何准确、快速地找到多个不同的目标字符串的问题,在介绍常见的模式匹配算法的优点和缺点基础上,结合Trie速多模式匹配算法。根据对比性实验的结果分析得出,改进AC且匹配速度大约是AC算法的5倍,说明改进AC
        There exists a large amount of text data on the Internet currently. In allusion to the problem that how to search out multiple different target character strings accurately and quickly in such large text,an improved fast multi-pattern matching algorithm is proposed on the basis of introducing the advantages and disadvantages of common pattern matching algorithms,and combining with the idea of converting the Trie tree into the double array form. A comparison experiment was carried out. The analysis results show that the improved AC algorithm can successfully match all the to-be queried pattern strings in the text,and its matching speed is about 5 times of that of the AC algorithm,which shows that the improved AC algorithm has good effects in aspects of matching speed,recall ratio and space utilization rate.
引文
[1]蒋莉莉.字符串模式匹配算法的改进研究[J].电脑知识与技术,2008(3):526-528.JIANG Lili.The research of improved matching algorithm of string pattern[J].Computer knowledge and technology,2008(3):526-528.
    [2]张利香.基于后缀数组的字符串模式查找的算法[D].兰州:西北师范大学,2010.ZHANG Lixiang.The string pattern searching algorithms based on suffix arrays[D].Lanzhou:Northwest Normal University,2010.
    [3]杨波.基于有限状态自动机的中文多模式匹配算法研究[D].合肥:合肥工业大学,2013.YANG Bo.Research on multi-pattern matching algorithm for Chinese based on finite state automata[D].Hefei:Hefei University of Technology,2013.
    [4]蔡恒,张帅.基于BF算法改进的字符串模式匹配算法[J].电脑编程技巧与维护,2014(22):14-15.CAI Heng,ZHANG Shuai.An improved string pattern matching algorithm based on BF algorithm[J].Computer programming skills&maintenance,2014(22):14-15.
    [5]李明月,张善卿,陆剑锋,等.一种改进的Sunday匹配算法[J].杭州电子科技大学学报(自然科学版),2015,35(1):93-96.LI Mingyue,ZHANG Shanqing,LU Jianfeng,et al.A modified Sunday matching algorithm[J].Journal of Hangzhou Dianzi University(Natural Sciences),2015,35(1):93-96.
    [6]刘丽霞,张志强.基于Trie树的相似字符串查找算法[J].计算机应用,2013,33(8):2375-2378.LIU Lixia,ZHANG Zhiqiang.Similar string search algorithm based on Trie tree[J].Journal of computer applications,2013,33(8):2375-2378.
    [7]YANG Wenchuan,FANG Zeyang,LI Pengfei.Study for the double-array Trie tree based algorithm in word segmentation[C]//Proceedings of 2015 International Conference on Computer Science and Environmental Engineering.Beijing:DEStech Publications Inc.,2015:7.
    [8]徐聪,张丰,杜震洪,等.基于哈希和双数组Trie树的多层次地址匹配算法[J].浙江大学学报(理学版),2014,41(2):217-222.XU Cong,ZHANG Feng,DU Zhenhong,et al.A multi-level address-matching algorithm based on Hash function and doublearray Trie-tree[J].Journal of Zhejiang University(Science edition),2014,41(2):217-222.
    [9]KAROONBOONYANAN T.An implementation of double-array Trie[EB/OL].[2018-11-21].https://linux.thai.net/~thep/datrie/datrie.html.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700