用户名: 密码: 验证码:
Supervised Adaptive Incremental Clustering for data stream of chunks
详细信息    查看全文
文摘
Many supervised clustering algorithms have been developed to find the optimal clusters for static datasets by presetting some parameters, but they are seldom suitable for dynamic datasets, such as the data stream of chunks. To find the optimal clusters of the data stream of chunks, a novel Supervised Adaptive Incremental Clustering (SAIC) algorithm is proposed. SAIC can cluster dynamic datasets of arbitrary shapes and sizes automatically. It includes learning and post-processing phases. In the learning phase, each cluster updates adaptively according to its learning rate that is calculated from its counter value. All data points are shuffled at each iteration in order to make SAIC insensitive to the input order of data points. In the post-processing phase, the outliers or boundary points are eliminated according to the counter value of each cluster and the number of iterations. Four synthetic datasets and fourteen UCI datasets are used to evaluate the performance of SAIC, respectively. The experiments on UCI datasets show that SAIC reaches to or outperforms some other supervised clustering algorithms and several unsupervised incremental clustering algorithms. In addition, three data stream of chunks are used to evaluate SAIC from different aspects, which shows SAIC has the scalability and incremental learning ability for the clustering of data streams of chunks.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700