用户名: 密码: 验证码:
Research on data stream clustering algorithms
详细信息    查看全文
  • 作者:Shifei Ding (1) (2)
    Fulin Wu (1)
    Jun Qian (1)
    Hongjie Jia (1)
    Fengxiang Jin (3)

    1. School of Computer Science and Technology
    ; China University of Mining and Technology ; Xuzhou ; 221116 ; China
    2. Key Laboratory of Intelligent Information Processing
    ; Institute of Computing Technology ; Chinese Academy of Science ; Beijing ; 100190 ; China
    3. College of Geomatics
    ; Shandong University of Science and Technology ; Qingdao ; 266590 ; China
  • 关键词:Data mining ; Data stream ; Clustering ; Data model
  • 刊名:Artificial Intelligence Review
  • 出版年:2015
  • 出版时间:April 2015
  • 年:2015
  • 卷:43
  • 期:4
  • 页码:593-600
  • 全文大小:136 KB
  • 参考文献:1. Aggarwal CC, Han J, Wang J et al (2003) A framewrok for clustering evolving data streams. In: Proceedings of VLDB 2003. pp 81鈥?2
    2. Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of the 30th international conference on very large data bases. pp 852鈥?63
    3. Aggarwal CC, Yu PS (2008) A framework for clustering uncertain data streams. In: Proceeding of the 24th international conference on data engineering. pp 150鈥?59
    4. Aggarwal CC, Yu PS (2008) Outlier detection with uncertain data. In: Proceeding of the SIAM data mining conference pp 483鈥?93
    5. Aggarwal, CC, Han, J, Wang, J (2005) On high dimension projected clustering of uncertain data streams. Data Min Knowl Discov 10: pp. 251-273 CrossRef
    6. Babcock B, Babu S, Datar M, et al (2002) Models and issues in data streams. In: Proceedings of the 21th ACM symposium on principles of database systems. pp 1鈥?6
    7. Barbar谩, D (2003) Requirements for clustering data streams. ACM SIGKDD Explor Newsl 3: pp. 23-27 CrossRef
    8. Bifet A, Holmes G, Pfahringer B (2009) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. pp 139鈥?48
    9. Bulut A, Singh AK (2003) SWAT: hierarchical stream summarization in large networks. In: Proceeding of the 19th international conference on data engineering. pp 303鈥?14
    10. Cao, F, Zhou, A (2007) Fast clustering of data stream using graphics processors. J Softw 18: pp. 291-304 CrossRef
    11. Chang, J, Cao, F, Zhou, A (2007) Clustering evolving data stream over sliding windows. J Softw 18: pp. 905-918 CrossRef
    12. Chen, H, Shi, B (2010) Wavelet synopsis based clustering of parallel data streams. J Softw 21: pp. 644-658 CrossRef
    13. Cormode G, Garofalakis M (2007) Sketching probabilistic data streams. In: Proceedings of the ACM SIGMOD international conference on management of data. pp 281鈥?89
    14. Dai, D, Zhao, W, Sun, L (2009) Effective clustering algorithm for probabilistic data stream. J Softw 20: pp. 1313-1328 CrossRef
    15. Dingi H, Trajcevski G, Scheuestern P, Xiaoyue W, Eamonn K (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. In: ACM Proceedings of the VLDB endowment. 1(2):1542鈥?552
    16. Gaber, MM, Zaslavsky, AB, Krishnaswamy, S (2005) Mining data streams: a review. SIGMOD Rec 34: pp. 18-26 CrossRef
    17. Guha, S, Meyerson, A (2003) Clustering datastreams: theory and practice. IEEE TKDE Special Issue Clust 3: pp. 37-46
    18. Guha, S, Meyerson, A, Mishra, N (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15: pp. 505-528 CrossRef
    19. Guha S, Harb B (2005) Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceeding of the 11th ACM SIGKDD international conference on knowledge discovery in data mining. pp 88鈥?7
    20. Guha S, Mishra N, Motwani R et al (2000) Clustering data streams. In: Proceedings of the 41st annual symposium on foundations of computer science. pp 359鈥?66
    21. Guha S, Mishra N, Motwani R et al (2000) Clustering data streams. In: Proceedings of the 41st annual symposium on foundations of computer science. Washington: IEEE Computer Society. pp 359鈥?66
    22. Han, D, Gong, P, Xiao, C (2011) Load shedding strategies on sliding window joins over data streams. J Comput Res Dev 48: pp. 103-109
    23. Jayram TS, Kale S, Vee E (2007) Efficient aggregation algorithms for probabilistic data. In: Proceeding of the 18th annual ACM-SIAM syrup. On discrete algorithms(SODA). pp 346鈥?55
    24. Jayram, TS, McGregor, A, Muthukrishan, VE (2008) Estimating statistical aggregates on probabilistic data streams. ACM Trans Database Syst 33: pp. 26-30 CrossRef
    25. Karras P, Mamoulis N (2005) One-pass wavelet synopses for maximum-errormetrics. In: Proceeding of the 31st international conference on very large data bases. pp 421鈥?32
    26. Kavitha, V, Punithavalli, M (2010) Clustering time series data stream鈥攁 literature survey. Int J Comput Sci Inf Secur IJCSIS 8: pp. 289-294
    27. Mahdiraji AR (2009) Clustering data stream: a survey of algorithms. Int J Knowl Based Intell Eng Syst 12(2):39鈥?4
    28. Motoyoshi, M, Miura, T, Shioya, I (2004) Clustering stream data by regression analysis. Duned Aust Comput Soc 32: pp. 115-120
    29. Muthukrishnan S (2003) Data streams algorithms and applications. In: Proceeding of the 14th annual ACM-SIAM symposium on discrete algorithms. pp 13鈥?13
    30. Ni, W, Lu, J, Chen, G, Sun, Z (2007) Efficient data stream clustering algorithm based on k-means partitioning and density. J Chin Comput Syst 28: pp. 83-87
    31. O鈥機allaghan L, Mishra N, Meyerson A et al (2002) Motwani. Streaming data algorithms for high-quality clustering. In: Proceedings of the 18th international conference on data engineering. pp 685鈥?04
    32. Ordonez C (2003) Clustering binary data streams with K- mean. In: Proceedings of DMKD鈥?3. pp 12鈥?9
    33. Palpanas T, Vlachos M, Keogh E (2004) Online amnesic approximation of streaming time series. In: Proceeding of the 20th international conference on data engineering. pp 339鈥?49
    34. Song M, Wang H (2005) Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In: Proceeding of intelligence computing: theory and application. pp 174鈥?83
    35. Sun, H, Zhao, F, Bao, Y (2004) CD-stream-a space partition based density clustering algorithm over data stream. J Comput Res Dev 41: pp. 289-294
    36. Sun, Y, Mao, G, Liu, X (2008) Ming concept drifts from data streams based on muti-classifiers. Acta Automatica Sinica 34: pp. 93-97 CrossRef
    37. Talbot, LM, Talbot, BG, Peterson, RE (1999) Application of fuzzy grade-of membership clustering to analysis of remotesensing data. J Clim 12: pp. 200-219 CrossRef
    38. Wang, XZ, Li, RF (1999) Combining conceptual clustering and principal component analysis for state space based process monitoring. Ind Eng Chem Res 38: pp. 4345-4358 CrossRef
    39. Wang, Y, Tang, CJ, Li, C, Chen, Y, Yang, N, Tang, R, Zhu, J (2009) Intervention events detection and prediction in data streams. Lect Notes Comput Sci 5446: pp. 519-525 CrossRef
    40. Wang, Y, Tang, CJ, Wang, Y (2011) Mining hotspots from multiple text streams based on stream information distance. J Softw 22: pp. 1761-1770 CrossRef
    41. Wu, F, Zhong, Y, Jin, X (2009) Arbitrary shape clustering algorithm for evolving data stream over sliding windows. J Chin Comput Syst 30: pp. 887-890
    42. Xin, L, Ni, Z, Huang, L (2007) Modifiable Birch cluster algorithm used in data stream. Comput Eng Appl 43: pp. 166-169
    43. Yang, C, Zhou, J (2007) A Heterogeneous data stream clustering algorithm. Chin J Comput 30: pp. 1364-1371
    44. Yang, N, Tang, C, Wang, Y (2010) Clustering algorithm on data stream with skew distribution based on density. J Softw 21: pp. 1031-1041 CrossRef
    45. Yue, Wang, Changjie, Tang, Ning, Yang (2011) Mining optimized probabilistic intervention strategy over uncertain data set. J Softw 22: pp. 285-297 CrossRef
    46. Zhang, C, Jin, C, Zhou, A (2010) Clustering algorithm over uncertain data stream. J Softw 21: pp. 2173-2181
    47. Zhang, L, Zou, P, Jia, Y (2011) Continuous dynamic skyline queries over data stream. J Comput Res Dev 48: pp. 77-85
    48. Zhang T, Ramakrishnan R, Livny M (1996) Birch: an efficient data clustering method for very large databases. In: Proceeding of the SIGMOD. pp 103鈥?14
    49. Zhu, W, Yin, J, Xie, Y (2006) Arbitrary shape cluster algorithm for clustering data stream. J Softw 17: pp. 379-387 CrossRef
    50. Zhu, Q, Zhang, Y, Hu, X (2011) A double-window-based classification algorithm for concept drifting data streams. Acta Automatica Sinica 37: pp. 1078-1084
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Science, general
    Complexity
  • 出版者:Springer Netherlands
  • ISSN:1573-7462
文摘
Data stream is a potentially massive, continuous, rapid sequence of data information. It has aroused great concern and research upsurge in the field of data mining. Clustering is an effective tool of data mining, so data stream clustering will undoubtedly become the focus of the study in data stream mining. In view of the characteristic of the high dimension, dynamic, real-time, many effective data stream clustering algorithms have been proposed. In addition, data stream information are not deterministic and always exist outliers and contain noises, so developing effective data stream clustering algorithm is crucial. This paper reviews the development and trend of data stream clustering and analyzes typical data stream clustering algorithms proposed in recent years, such as Birch algorithm, Local Search algorithm, Stream algorithm and CluStream algorithm. We also summarize the latest research achievements in this field and introduce some new strategies to deal with outliers and noise data. At last, we put forward the focal points and difficulties of future research for data stream clustering.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700