A MapReduce solution for associative classification of big data

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

A MapReduce solution for associative classification of big data

详细信息查看全文

作者：Alessio Bechini ; Francesco Marcelloni ; ^{francesco.marcelloni@unipi.it" class="auth_mail" title="E-mail the corresponding author} ; ^{f.marcelloni@iet.unipi.it" class="auth_mail" title="E-mail the corresponding author} ; Armando Segatori
关键词：Associative classifiers ; Big data ; MapReduce ; Cluster computing frameworks
刊名：Information Sciences
出版年：2016
出版时间：1 March 2016
年：2016
卷：332
期：Complete
页码：33-55
全文大小：1677 K

文摘

Associative classifiers have proven to be very effective in classification problems. Unfortunately, the algorithms used for learning these classifiers are not able to adequately manage big data because of time complexity and memory constraints. To overcome such drawbacks, we propose a distributed association rule-based classification scheme shaped according to the MapReduce programming model. The scheme mines classification association rules (CARs) using a properly enhanced, distributed version of the well-known FP-Growth algorithm. Once CARs have been mined, the proposed scheme performs a distributed rule pruning. The set of survived CARs is used to classify unlabeled patterns. The memory usage and time complexity for each phase of the learning process are discussed, and the scheme is evaluated on seven real-world big datasets on the Hadoop framework, characterizing its scalability and achievable speedup on small computer clusters. The proposed solution for associative classifiers turns to be suitable to practically address big datasets even with modest hardware support. Comparisons with two state-of-the-art distributed learning algorithms are also discussed in terms of accuracy, model complexity, and computation time.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700