K叉树地址的模糊匹配研究与实现

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

K叉树地址的模糊匹配研究与实现

详细信息查看全文 | 推荐本文 |

英文篇名：Research and Implementation of Fuzzy Matching for K-tree Address
作者：李新放 ; 宋转玲 ; 陈学业 ; 贺彪 ; 刘海行
英文作者：LI Xinfang;SONG Zhuanling;CHEN Xueye;HE Biao;LIU Haixing;The First Institute of Oceanography,State Oceanic Administration;Laboratory for Regional Oceanography and Numerical Modeling,Qingdao National Laboratory for Marine Science and Technology;Shenzhen Research Center of Digital City Engineering;
关键词：地址匹配 ; 分词 ; 模糊匹配 ; K叉树
英文关键词：address matching;;address segmentation;;fuzzy matching;;K-tree
中文刊名：CHTB
英文刊名：Bulletin of Surveying and Mapping
机构：国家海洋局第一海洋研究所;青岛海洋科学与技术国家实验室区域海洋动力学与数值模拟功能实验室;深圳市数字城市工程研究中心;
出版日期：2018-09-25
出版单位：测绘通报
年：2018
期：No.498
基金：国家重点研发计划(2016YFA0602200);; 中央级公益性科研院所基本科研业务费专项资金(2015G18; 2015P12)
语种：中文;
页：CHTB201809027
页数：5
CN：09
ISSN：11-2246/P
分类号：134-137+163

摘要

在数字城市信息资源的集成和融合中,地名地址匹配是一项非常关键的基础技术。由于中文语义和地名地址描述的复杂性,中文地址的匹配比英文要复杂得多,基于海量中文地址数据进行准确分词,实现快速高效的地址匹配是城市数据集成融合的关键问题。本文在对现有地址编码及分词技术研究的基础上,通过一种基于规则和统计的组合方法来实现中文地址分词,并且使用K叉树的结构实现对中文地址的存储,提高了中文地址匹配查询的准确度和效率。基于预处理后的10 000个深圳市地址数据,通过开发原型系统对该方法进行了比较测试,验证了该方法的有效性。
In the integration of digital city information resources,the address matching is a very crucial basic technology. Due to the complexity of description of Chinese semantics and address,the matching of Chinese addresses is much more complicated than that of English.How to accurately segment words based on mass data of Chinese address and realize fast and efficient address matching is an urgent problem to be solved.Based on the research of existing address coding and word segmentation technology,this paper proposes a combination method based on rules and statistics to implement Chinese address segmentation,and uses K-tree to store the Chinese address and improves the Chinese address matching query Accuracy and efficiency.The method was tested by the prototype system based on 10,000 address data after pretreatment in Shenzhen City to verify the effectiveness of the method.

引文

[1]承继成,王宏伟.城市如何数字化——纵谈城市信息建设[M].北京:中国城市出版社,2002:104-108.
    [2]江洲,李琦.地理编码的研究应用[J].地理与地理信息科学,2003,19(3):21-25.
    [3]张雪英,闾国年,李伯秋,等.基于规则的中文地址要素解析方法[J].地球信息科学学报,2010,12(1):9-17.
    [4]林筝.集成全景地图的街道地名服务建设方法[J].测绘通报,2016(4):124-127.
    [5]李军,李琦,毛东军,等.北京市地理编码数据库研究[J].计算机工程与应用,2004,2(1):1-6.
    [6] SELLERS P.The Theory and Computation of Evolutionary Distance:Pattern Recognition[J]. Journal of Algoriths,1980,1:359-373.
    [7] MYERS G.A Fast Bit-vector Algorithm for Approximate String Matching Based on Dynamic Programming[J].Journal of the ACM,1999,46(3):395-415.
    [8] NAVARRO G,RICARDO B Y. Very Fast and Simple Approximate String Matching[J]. Information Processing Letters,1999,72:65-70.
    [9]马照亭,李志刚,孙伟,等.一种基于地址分词的自动地理编码算法[J].测绘通报,2011(2):59-62.
    [10]习明,王增辉,庄怡.基于双层哈希表的中文分词算法优化[J].软件导刊,2010,9(10):54-55.
    [11]赵阳阳,王亮,仇阿根.地址要素识别机制的地名地址分词算法[J].测绘科学,2013,38(5):74-78.
    [12]陈开渠,赵洁,彭志威.快速中文字符串模糊匹配算法[J].中文信息学报,2004(2):58-65.
    [13]魏金明,仲伟政.基于置信度的地址匹配方法初探[J].测绘科学,2015(1):122-125.
    [14]亢孟军,杜清运,王明军.地址树模型的中文地址提取方法[J].测绘学报,2015,44(1):99-107.
    [15]马宁,李亚超,何向真,等.一种实用的资源稀缺条件下的分词方法[J].计算机应用研究,2016,22(1):68-70.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700