用户名: 密码: 验证码:
Stream-Dashboard: A big data stream clustering framework with applications to social media streams.
详细信息   
  • 作者:Hawwash ; Basheer.
  • 学历:Doctor
  • 年:2013
  • 毕业院校:University of Louisville
  • Department:Computer Engineering and Computer Science.
  • ISBN:9781303854439
  • CBH:3586954
  • Country:USA
  • 语种:English
  • FileSize:18028061
  • Pages:228
文摘
Data mining is concerned with detecting patterns of data in raw datasets,which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas including business,education,astronomy,security and Information Retrieval to name a few. Many applications generate massive amounts of data continuously and at an increasing rate. This is the case for user activity over social networks such as Facebook and Twitter. This flow of data has been termed,appropriately,a Data Stream,and it introduced a set of new challenges to discover its evolving patterns using data mining techniques. Data stream clustering is concerned with detecting evolving patterns in a data stream using only the similarities between the data points as they arrive without the use of any external information i.e. unsupervised learning). In this dissertation,we propose a complete and generic framework to simultaneously mine,track and validate clusters in a big data stream Stream-Dashboard). The proposed framework consists of three main components: an online data stream clustering algorithm,a component for tracking and validation of pattern behavior using regression analysis,and a component that uses the behavioral information about the detected patterns to improve the quality of the clustering algorithm. As a first component,we propose RINO-Streams,an online clustering algorithm that incrementally updates the clustering model using robust statistics and incremental optimization. The second component is a methodology that we call TRACER,which continuously performs a set of statistical tests using regression analysis to track the evolution of the detected clusters,their characteristics and quality metrics. For the last component,we propose a method to build some behavioral profiles for the clustering model over time,that can be used to improve the performance of the online clustering algorithm,such as adapting the initial values of the input parameters. The performance and effectiveness of the proposed framework were validated using extensive experiments,and its use was demonstrated on a challenging real word application,specifically unsupervised mining of evolving cluster stories in one pass from the Twitter social media streams.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700