用户名: 密码: 验证码:
面向多尺度数据挖掘的数据尺度划分方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Data Scaling Method for Multi-scale Data Mining
  • 作者:张昉 ; 赵书良 ; 武永亮
  • 英文作者:ZHANG Fang;ZHAO Shu-liang;WU Yong-liang;College of Mathematics &Information Science,Hebei Normal University;Hebei Key Laboratory of Computational Mathematics & Applications,Hebei Normal University;
  • 关键词:多尺度数据挖掘 ; 多尺度划分 ; 离散化 ; 构建多尺度数据集 ; 基准尺度选择 ; 多尺度 ; 信息熵
  • 英文关键词:Multi-scale data mining;;Multi-scale scaling;;Discretization;;Construction of multi-scale datasets;;Reference scale selection;;Multi-scale entropy;;Information entropy
  • 中文刊名:JSJA
  • 英文刊名:Computer Science
  • 机构:河北师范大学数学与信息科学学院;河北师范大学河北省计算数学与应用重点实验室;
  • 出版日期:2019-04-15
  • 出版单位:计算机科学
  • 年:2019
  • 期:v.46
  • 基金:国家自然科学基金资助项目(71271067);; 国家社科基金重大项目(13&ZD091,18ZDA200);; 河北师范大学硕士基金资助项目(CXZZSS2017048)资助
  • 语种:中文;
  • 页:JSJA201904009
  • 页数:9
  • CN:04
  • ISSN:50-1075/TP
  • 分类号:63-71
摘要
多尺度挖掘在图形图像、地理信息、信号分析、数据挖掘等领域已有应用,多尺度数据挖掘在关联规则、聚类、分类挖掘领域也有相关研究与应用,但对如何对数据集进行普适性的多尺度划分以及如何构建多尺度数据集仍未展开研究,已有相关研究缺乏深度。文中从多尺度数据挖掘任务入手,定义了尺度概念,并给出了多尺度化数据集模型,以及基准尺度评分模型;依据概率密度估计的离散化方法提出了多尺度划分算法,扩展了可划分尺度的数据类型,划分结果更贴近数据的多尺度特性,且具有较低的时间复杂度;提出了多尺度化数据集方法、构建多尺度数据集算法和基准尺度选择算法,将多尺度熵与信息熵作为评价方法,在扩充多尺度化数据集方法的基础上,有效减弱了多尺度数据挖掘中因尺度推衍而产生的尺度效应,算法的时间复杂性也较为可控。利用H省真实人口数据集、UCI公用数据集和T10I4D100K数据集对所提算法和模型进行验证与实验分析,结果表明多尺度划分算法和多尺度化数据集方法是可行的,提出的多尺度化数据集方法和基准尺度评分模型是有效的,多尺度划分方法、构建多尺度数据集方法和基准尺度选择方法的应用平均提高了尺度推衍过程中1.6%的覆盖率、2.1%的F1-measure和3.7%的正确率,且具有较低的平均支持度误差。
        Multi-scale mining has been applied in the fields of graphic images,geographic information,signal analysis,data mining,etc,and also has related research and application in the fields of association rules,clustering and classification mining.Nevertheless how to divide datasets into common scales and how to construct multi-scale datasets have not been studied in depth.Starting with the task of multi-scale data mining,this paper defined the concept of scale and gave a multi-scale dataset model and a benchmark scale scoring model.This paper proposed a multi-scale partition algorithm based on the discretization method of probability density estimation,which extends the data types of divisible scales,and its partition results are closer to the multi-scale characteristics of data with lower time complexity.This paper also proposed a multi-scale dataset method,a multi-scale data set algorithm and a benchmark scale selection algorithm.Multi-scale entropy and information entropy were used as evaluation methods.On the basis of expanding the multi-scale dataset method,the scale effect produced by the meso-scale derivation of multi-scale data mining can be effectively reduced,and the time complexity can be controlled.The proposed algorithm and model were validated and analyzed by using the real population dataset of H province,UCI common dataset and IBM dataset.The experimental results show that the proposed method is feasible and the proposed model is effective.The application of the proposed methods improves coverage by 1.6%,F1-measure by 2.1% and accuracy by 3.7% in scale deduction process,and has low average support error.
引文
[1] SUN Q X,LI M T,LU J X,et al.Scale of geospatial data and its research progress [J].Geography and Geographic Information Science,2007,23(4):53-56,80.(in Chinese)孙庆先,李茂堂,路京选,等.地理空间数据的尺度问题及其研究进展[J].地理与地理信息科学,2007,23(4):53-56,80.
    [2] LIU M M,ZHAO S L,HAN Y H,et al.Research on multi-scale data mining method[J].Journal of Software,2016,27(12):3030-3050.(in Chinese)柳萌萌,赵书良,韩玉辉,等.多尺度数据挖掘方法[J],软件学报,2016,27(12):3030-3050.
    [3] HAN Y H,ZHAO S L,LIU M M,et al.Multi-scale Clustering Mining Algorithm [J].Computer Science,2016,43(8):244-248.(in Chinese)韩玉辉,赵书良,柳萌萌,等.多尺度聚类挖掘算法[J].计算机科学,2016,43(8):244-248.
    [4] LIU Q,HANG R,SONG H,et al.Learning Multi-Scale Deep Features for High-Resolution Satellite Image Classification[J].IEEE Transactions on Geoscience & Remote Sensing,2016,PP(99):1-10.
    [5] AZAMI H,FERNáNDEZ A,ESCUDERO J.Refined multiscale fuzzy entropy based on standard deviation for biomedical signal analysis[J].Medical & Biological Engineering & Computing,2017,55(11):2037-2052.
    [6] LI Z,WEI Z,WEN C,et al.Detail-Enhanced Multi-Scale Exposure Fusion[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2017,26(3):1243-1252.
    [7] SHEN L,SUN G,HUANG Q M,et al.Multi-Level Discriminative Dictionary Learning With Application to Large Scale Image Classification[J].IEEE Transactions on Image Processing,2015,24(10):3109-3123.
    [8] LIAO S,ZHU Q,QIAN Y,et al.Multi-granularity feature selection on cost-sensitive data with measurement errors and variable costs[OL].https://www.onacademic.com/detail/journal_1000040426607310_1fb6.html.
    [9] LANGARI B,VASEGHI S,PROCHAZKA A,et al.Edge- Guided Image Gap Interpolation Using Multi-Scale Transformation[J].IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society,2016,25(9):4394-4405.
    [10] LIU M M,ZHAO S L,CHEN M,et al.Scaling-up mining algorithm of multi-scale association rules mining [J].Application Research of Computers,2015,32(10):2924-2929.(in Chinese)柳萌萌,赵书良,陈敏,等.多尺度关联规则挖掘的尺度上推算法[J].计算机应用研究,2015,32(10):2924-2929.
    [11] LI C,ZHAO S L,ZHAO J P,et al.Scaling-up Algorithm of Multi-scale association rules [J].Computer Science,2017,44(8):285-289.(in Chinese)李超,赵书良,赵骏鹏,等.多尺度关联规则尺度上推算法[J].计算机科学,2017,44(8):285-289.
    [12] LI J X,ZHAO S L,AN L,et al.Scaling-up Algorithm of Multi-scale Classification Based on Fractal Theory[J].Computer Scie-nce,2018,45(S1):453-459.(in Chinese)李佳星,赵书良,安磊,等.基于分形理论的多尺度分类尺度上推算法[J].计算机科学,2018,45(S1):453-459.
    [13] LI J X,ZHAO S L,AN L,et al.Scaling-down Algorithm of Multi-scale Classification Based on Fractal Theory[J].Application Research of Computers,2019(7):1-3.(in Chinese)李佳星,赵书良,安磊,等.基于广义分形插值理论的多尺度分类尺度下推算法[J].计算机应用研究,2019(7):1-3.
    [14] PETRY F E,YAGER R R.Fuzzy Concept Hierarchies and Evidence Resolution[J].IEEE Transactions on Fuzzy Systems,2014,22(5):1151-1161.
    [15] KANG X,MIAO D.A study on information granularity in formal concept analysis based on concept-bases[J].Knowledge-Based Systems,2016,105(C):147-159.
    [16] HAO C,LI J,FAN M,et al.Optimal scale selection in dynamic multi-scale decision tables based on sequential three-way decisions[J].Information Sciences,2017,415:213-232.
    [17] ZHAO J P,ZHAO S L,LI C,et al.A multi-scale clustering algorithm based on grain calculation [J].Application Research of Computers,2018,35(2):362-366.(in Chinese)赵骏鹏,赵书良,李超,等.基于粒计算的多尺度聚类尺度上推算法[J].计算机应用研究,2018,35(2):362-366.
    [18] BIBA M,ESPOSITO F,FERILLI S,et al.Unsupervised discre- tization using kernel density estimation[C]//Proceedings of the International Joint Conference on Artificial Intelligence,Hyderabad,India,January.DBLP,2008:696-701.
    [19] ZHOU C H,ZHANG J T.A geospatial data mining model based on information entropy [J].Chinese Journal of Image and Graphics,1999,4(11):946-951.(in Chinese)周成虎,张健挺.基于信息熵的地学空间数据挖掘模型[J].中国图象图形学报,1999,4(11):946-951.
    [20] GOU J,LIU J Y,WEI Z B,et al.Analysis of power energy flow complexity based on multi-scale entropy [J].Acta Physica Sinica,2014(20):347-354.(in Chinese)苟竞,刘俊勇,魏震波,等.基于多尺度熵的电力能量流复杂性分析[J].物理学报,2014(20):347-354.
    [21] BRUNI R,BIANCHI G.Effective Classification Using a Small Training Set Based on Discretization and Statistical Analysis[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(9):2349-2361.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700