用户名: 密码: 验证码:
Real-time processing of streaming big data
详细信息    查看全文
文摘
In the era of data explosion, high volume of various data is generated rapidly at each moment of time; and if not processed, the profits of their latent information would be missed. This is the main current challenge of most enterprises and Internet mega-companies (also known as the big data problem). Big data is composed of three dimensions: Volume, Variety, and Velocity. The velocity refers to the high speed, both in data arrival rate (e.g., streaming data) and in data processing (i.e., real-time processing). In this paper, the velocity dimension of big data is concerned; so, real-time processing of streaming big data is addressed in detail. For each real-time system, to be fast is inevitable and a necessary condition (although it is not sufficient and some other concerns e.g., real-time scheduling must be issued, too). Fast processing is achieved by parallelism via the proposed deadline-aware dispatching method. For the other prerequisite of real-time processing (i.e., real-time scheduling of the tasks), a hybrid clustering multiprocessor real-time scheduling algorithm is proposed in which both the partitioning and global real-time scheduling approaches are employed to have better schedulablity and resource utilization, with a tolerable overhead. The other components required for real-time processing of streaming big data are also designed and proposed as real time streaming big data (RT-SBD) processing engine. Its prototype is implemented and experimentally evaluated and compared with the Storm, a well-known real-time streaming big data processing engine. Experimental results show that the proposed RT-SBD significantly outperforms the Storm engine in terms of proportional deadline miss ratio, tuple latency and system throughput.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700