Feature selection with missing data using mutual information estimators

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Feature selection with missing data using mutual information estimators

详细信息查看全文

作者：Gauthier Doquire ; ^{gauthier.doquire@uclouvain.be} ; [Author Vitae] ; Michel Verleysen ^{michel.verleysen@uclouvain.be} ; [Author Vitae]
关键词：Feature selection ; Missing data ; Mutual information
刊名：Neurocomputing
出版年：2012
出版时间：1 August, 2012
年：2012
卷：90
期：Complete
页码：3-11
全文大小：327 K

文摘

Feature selection is an important preprocessing task for many machine learning and pattern recognition applications, including regression and classification. Missing data are encountered in many real-world problems and have to be considered in practice. This paper addresses the problem of feature selection in prediction problems where some occurrences of features are missing. To this end, the well-known mutual information criterion is used. More precisely, it is shown how a recently introduced nearest neighbors based mutual information estimator can be extended to handle missing data. This estimator has the advantage over traditional ones that it does not directly estimate any probability density function. Consequently, the mutual information may be reliably estimated even when the dimension of the space increases. Results on artificial as well as real-world datasets indicate that the method is able to select important features without the need for any imputation algorithm, under the assumption of missing completely at random data. Moreover, experiments show that selecting the features before imputing the data generally increases the precision of the prediction models, in particular when the proportion of missing data is high.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700