粗糙集在数据挖掘不确定性问题中的研究

英文题名：A Study on Rough Set Based Data Mining Uncertainty
作者：魏悦亮
论文级别：硕士
学科专业名称：计算机技术
中文关键词：数据挖掘 ; 粗糙集理论 ; 不完备信息系统
英文关键词：data mining ; rough set theory ; incomplete information system
学位年度：2010
导师：张文东
学科代码：081202
学位授予单位：中国石油大学
论文提交日期：2009-12-01

摘要

数据挖掘中面临大量的不完备信息系统,即可能存在部分对象的一些属性值未知的情况。不完备的数据可能使挖掘过程陷入混乱,导致不可靠的输出。其所表现出来的不确定性也更加显著,这将大大增加数据挖掘的难度。
     本文以一种处理不精确、不确定和模糊知识的数学方法—粗糙集理论为主要工具,逐步深入展开对不完备信息系统下数据挖掘的研究,以期减少数据挖掘研究与实际应用之间的差距。
     本文首先对粗糙集理论与数据挖掘展开阐述,包括粗糙集的理论基础、研究方向和研究领域;数据挖掘的概念、特点、任务、分类、要求及数据挖掘中的不确定性;重点说明粗糙集理论可作为数据挖掘方法的依据以及粗糙集应用于数据挖掘的几种常用方法。
     然后在粗糙集理论不完备信息系统的数据挖掘理论基础上,重点围绕粗糙集与其他方法相结合,阐述了粗糙集-决策树在数据挖掘中的应用;基于粗糙集和遗传算法的数据挖掘方法;粗糙集神经网络算法在数据挖掘中的应用。
     最后将理论应用于实践以解决实际问题,以高校学生成绩分析为例,提出了基于粗糙集理论的数据挖掘技术在成绩分析中应用的解决方案,表明数据挖掘技术在分析影响高校学生学习成绩因素中具有好的作用,将数据挖掘技术应用在高校教学的各个方面,应该会得到大量的有现实意义的结果,从而制定相应的措施来提高教育教学质量。
Missing or incomplete data are a major concern in data mining both because a substantial proportion of the data may be missing in real-world applications and because poor methods for incomplete data will bias the results of data mining. In addition, it is of great difficulty for data mining in an incomplete information system, which contains more uncertainty than a complete one does.
     This paper applies rough set theory-a mathematical tool for dealing with inexact, uncertain or vague knowledge-to handling incomplete data in data mining, so as to reduce the large gap between the available data and the machinery available to process the data.
     Firstly, The development the main body of a book is excavated first to the rough collection theory and data expounds , includes rough collection rationale , studies direction and studies a field; Uncertainty in the concept , characteristic , mission , classification , call for and data that the data excavates being excavated; Priority explains rough collection theory but the basis and rough collection excavating method's as the data apply to several kinds method in common use that the data excavates.
     Priority surrounds rough collection and method combines other each other and then on excavating a rationale in rough no complete collection theory information system data,have set forth rough collection- decision-making tree application in excavating in the data; Excavate method owing to rough collection and inheritance algorithm data; Rough collection neural networks algorithm application in excavating in the data.
     The at last has applied theory to practice to resolve the actual problem , has taken that colleges and universities student achievement analyses as example , has brought forward the scheme excavating solving that the technology middle applies to achievement analysis owing to rough collection theory data , indicated a data excavating a technology having the good effect in analysing the factor affecting colleges and universities student academic record, the technology a data is excavated applies to colleges and universities teaching each aspect , ought to be able to get large amount of the result having practical or immediate significance , come to improve education quality of teaching working out corresponding measure thereby.

引文

[1]刘清.Rough集及Rough推理[M].北京:科学出版社,2001
    [2]朱明.数据挖掘[M].合肥:中国科学技术大学出版社,2002
    [3]胡可云.基于粗糙集合的知识发现系统开发与研究[D].合肥:合肥工业大学,1998
    [4]印勇.粗糙集理论及其在数据挖掘中的应用[J].重庆大学学报,2004(2)
    [5]Tzung-Pei Hong,Li-Huei Tseng,Shyue-liang Wang.Leaning rules from incomplete training examples by rough sets[J],Expert Systems with Applications,2002,22:193-285
    [6]Wang G Y.Rough sets theory and knowledge acquisition[M].Press of Xi’an Jiaotong University,2001
    [7]王国胤.Rough集理论在不完备信息系统中的扩充[J].计算机研究与发展,2002(10)
    [8]Wroblewski J.Finding minimal reductions using genetic algorithms[J].Proc.of the Second Annual Join Conference on Information Sciences,September 28-october l,1995:186-189
    [9]Shan A N,ChanC,Cercone N,et al.Discovering Rules for Water Demand Prediction:An Enhanced Rough set Approach[J].Engng Applie Artif Intell,1996,9(6):645-653
    [10]Ohrn.ROSETTA Technical Reference Manual,Knowledge Systems GrouP,DePartment of ComPuter and Information Science,Norwegian University of Science and Technology,2000
    [11]王光宏,蒋平.数据挖掘综述[J].同济大学学报,2004(2)
    [12]Lingras,P.J.Plausibilistic rule extraction from incomplete databases using non-transitive Rough set model,Proceedings of the twenty-third Computer Science Conference workshop On Rough sets and Database Minim,1995
    [13]WU,Q.,Suetens,P.,Oosterlinck,A.Integration of heuristic and Bayesian approaches in aPattern-classification system[J],Knowledge discovery in database,AAAI/MIT Press,249-260
    [14]钟玲,许壮志,薛健.粗糙集理论在数据挖掘中的应用[J].沈阳工业大学学报,2003(8)
    [15]许亚梅.粗糙集理论及其在数据挖掘中的应用[J].应用技术,2006(3)
    [16]王书青,刘旭东.基于粗糙集理论的数据挖掘研究初探[J].石家庄铁路职业技术学院学报,2005(6)
    [17]梁美莲.不完备信息系统中数据挖掘的粗糙集方法[D].广西:广西大学,2005
    [18]J R Quinlan, Unknown Attribute Values in Induction, In: Alberto Maria Segre (ed.), Proc 6thint’workshop on machine learning, Ithaca, 1989, 164-168
    [19]J R Quinlan, Unknown Attribute Values in Induction, In: Alberto Maria Segre (ed.), Proc 6thint’workshop on machine learning, Ithaca, 1989, 164-168
    [20]Quinlan,J.R.Induction of decision tree[J],Machine earning,1989,1:81-106
    [21]胡运发.数据与知识工程导论[M].北京:清华大学出版社,2003
    [22]陈京民.数据仓府与数据挖掘技术[M].北京:电子工业出版社,2002
    [23]孟晓明.浅谈数据挖掘技术[J].计算机应用与软件,2004(8)
    [24]王军.数据库知识发现的研究[D].北京:中国科学院软件研究所,1998
    [25]苏健,高济.粗糙决策支持方法[J].计算机学报,2003(6)
    [26]R JA Little, D B Rubin, Statistical Analysis with Missing Data, Wiley Series in Probabilityand Mathematical Statistics, New York, Wiley and Sons,1987
    [27]郝忠孝.空值环境下数据库导论[M].北京:机械工业出版社,1996
    [28]胡旺,冯伟森,李志蜀等.基于粗糙集理论不完备信息系统的数据挖掘[J].四川大学学报(自然科学版),2004(8)
    [29]王钰,王任,苗夺谦等.基于Rough Set理论的“数据浓缩”.计算机学报,1998 (5)
    [30]黄梯云等.数据挖掘中一种基于粗糙集理论的属性值离散映射方法[J].情报学报,2002(4)
    [31]吴成东,许可,韩中华等.基于粗糙集和决策树的数据挖掘方法[J].东北大学学报(自然科学版),2006(5)
    [32]Xiaolu Huang. A pseudo-nearest-neighbor approach for missing data recovery on Gaussianrandom data Sets, Patten Recognition Letters, 2002, 23
    [33]周明,孙树栋.遗传算法原理及应用[M].北京:国防工业出版社,1999
    [34]刘洋.粗糙集和神经网络理论在数据挖掘中的应用分析[J].农业网络信息,2008(9)
    [35]徐泽柱,王林.基于粗糙集理论和BP神经网络的数据挖掘算法[J].计算机工程与应用,2004(3)