用户名: 密码: 验证码:
Knowledge Discovery from Databases: Cost-sensitive and imbalance learning.
详细信息   
  • 作者:Yang ; Zhuo.
  • 学历:Doctor
  • 年:2010
  • 导师:Sheng, Olivia R. Liu,eadvisorAggarwal, Rohitecommittee memberHu, Paulecommittee memberMoore, Williamecommittee memberPant, Gautamecommittee member
  • 毕业院校:The University of Utah
  • Department:Business
  • ISBN:9781124392691
  • CBH:3433553
  • Country:USA
  • 语种:English
  • FileSize:2546949
  • Pages:108
文摘
In the current business world, data collection for business analysis is not difficult any more. The major concern faced by business managers is whether they can use data to build predictive models so as to provide accurate information for decision-making. Knowledge Discovery from Databases (KDD) provides us a guideline for collecting data through identifying knowledge inside data. As one of the KDD steps, the data mining method provides a systematic and intelligent approach to learning a large amount of data and is critical to the success of KDD. In the past several decades, many different data mining algorithms have been developed and can be categorized as classification, association rule, and clustering. These data mining algorithms have been demonstrated to be very effective in solving different business questions. Among these data mining types, classification is the most popular group and is widely used in all kinds of business areas. However, the exiting classification algorithm is designed to maximize the prediction accuracy given by the assumption of equal class distribution and equal error costs. This assumption seldom holds in the real world. Thus, it is necessary to extend the current classification so that it can deal with the data with the imbalanced distribution and unequal costs. In this dissertation, I propose an Iterative Cost-sensitive Naive Bayes (ICSNB) method aimed at reducing overall misclassification cost regardless of class distribution. During each iteration, k nearest neighbors are identified and form a new training set, which is used to learn unsolved instances. Using the characteristics of the nearest neighbor method, I also develop a new under-sampling method to solve the imbalance problem in the second study. In the second study, I design a general method to deal with the imbalance problem and identify noisy instances from the data set to create a balanced data set for learning. Both of these two methods are validated using multiple real world data sets. The empirical results show the superior performance of my methods compared to some existing and popular methods.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700