基于并行计算的数据流处理方法研究

英文题名：Research on Processing Methods of Data Stream Based on Parallel Computing
作者：周勇
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：数据流 ; 并行计算 ; 图形处理器通用计算 ; 趋势预测 ; 频繁项集 ; 数据流相关性
英文关键词：Data Stream ; Parallel Computing ; GPU ; Trend Prediction ; Frequent Item
英文关键词：Sets ; Data Stream Correlation
学位年度：2013
导师：程春田
学科代码：081203
学位授予单位：大连理工大学
论文提交日期：2013-06-01

摘要

量大流速快的数据流挖掘已经成为当今国际学术界大数据处理的研究热点,与静态存储的数据相比,这些数据是连续实时获得的单次扫描数据。对于快速时变的数据流,在有限的内存资源下无法存储全部的数据流数据,如何精确地增量挖掘其连续变化趋势和发现隐藏的相关性对数据流的实时分析与处理带来了巨大的挑战,另一方面,数据流处理时滞也成为制约数据流挖掘的一个尖锐瓶颈问题。基于以上问题,本文研究了数据流趋势和相关性分析的融合并行计算模型和算法,将数据流挖掘与基于CPU (Central Process Unit)和GPU(Graphic Process Unit)的高性能计算有机地结合起来,实现动态连续的高效数据流处理方法。论文的主要研究内容可归纳如下：
     1、针对非线性非平稳时间序列数据流的预测能力不足问题,研究了基于HHT(Hilbert-Huang Transform)的Online-HHT分析方法,进一步结合RBF (Radial Basis Function)神经网络理论,研究了适合在线预测的时间序列数据流模型。该方法通过引入CPU多线程的并行处理方法,设计了时间序列数据流链式可重写滑动窗口的数据读写技术,实现了细粒度分段数据的并行预测分量和分段结果的合成算法。Online-HHT方法既能发挥其对时间序列数据流的时频自适应分析能力,又具有更快的计算处理速度,Online-HHT得到的数据流本征模分量也降低了RBF神经网络预测结构的输入复杂性,对时间序列数据流的趋势预测能力起到很大地提高。实验结果表明,通过与其他方法相比较,本文提出的方法能够处理数据流的短期趋势预测,并且处理速度更快,可应用于在线预测。
     2、针对在数据流频繁项挖掘中使用模式树造成空间复杂度过大的问题,提出了一种嵌套滑动窗口遗传算法NSWGA (Nested Sliding Window Genetic Algorithm)的数据流频繁项挖掘方法。本算法在滑动窗口中的数据流上分割出嵌套窗口,利用基于MPI的遗传算法并行处理嵌套窗口中的数据流,以及改进初始种群获得方法,实现了嵌套窗口中数据流的频繁模式快速挖掘。在数据流动过程中,采用定期删除过期数据的方法,更新滑动窗口中最新的频繁项集,进而实现增量维护,提高执行效率,快速发现数据流中的频繁项。
     3、针对由于资源约束造成的数据流处理时滞和效率问题,研究了最新超算技术GPU并行计算结构,根据数据流数据属性的特点和处理的高性能需求,提出了基于GPU的数据流通用处理模型。根据GPU并行计算结构的SIMT模式,采用基本窗口技术的滑动窗口模型,给出了粗粒度和细粒度两个并行计算层面的数据流处理结构,将数据流的数据划分为粒度合适的数据块,然后进行概要数据结构和各种挖掘算法的并行处理。粗粒度并行主要负责任务分工并行化,而细粒度并行负责抽取数据流概要数据结构的并行化,也负责在GPU上完成数据流挖掘和计算密集的线程网格,达到高效率的数据交换和高性能的并行算法。在这个通用数据流处理模型上,提出了基于GPU的数据流分位数并行计算方法GSQ(GPU Stream Quantiles),调用GPU内核程序,使用哈希方法对数据流的数据块并行计算生成概要数据直方图,最后查询得到数据流分位数,实验验证了从处理带宽、响应时间和加速比都有很大的提高。
     4、针对在CPU上多条数据流相关性分析受到资源和执行顺序的实时性约束限制问题,本文研究提出了CPU和GPU协同处理的跨总线四层滑动窗口框架,用于处理多条数据流的并行计算,把多条数据流完全映射到GPU内存空间,建立数据流SID索引,使用基本子窗口偏移量可以实现不同级别的并行操作。构造了适合多数据流的多级并行计算处理,使用s→Thread的细粒度并行计算和s→Block中粒度的方式,给出了单维多数据流的相关性分析并行算法GSSCCA(GPU Single-Dimensional Stream Canonical Correlation Analysis),实验验证了算法有很好的准确度,极大提高了计算速度。
     5、对由多数据属性记录实时复杂信息的高维多数据流来说,在计算准确性和性能会出现比单维多数据流处理更为复杂的资源和执行顺序约束问题。针对这个问题,进一步深入研究了高维多数据流的相关性分析数学模型,提出了GPU上的高维多数据流相关性处理的模型与实施的架构以及并行计算方法GMSCCA(GPU Multi-Dimensional Stream Canonical Correlation Analysis)。使用数据立方体和维度约简的技术,在计算资源受限和高效率要求的环境下,可以快速精确地完成计算,并且在高性能和近似精度之间能够很好地平衡。
It is attracting significant attention for mining large volumes of data at high speed in the world. High performance methods are extremely demanded to achieve the continuous data stream mining. This type of dynamic data, compared with its static counterpart, exhibits such new characteristics that the data are sequentially acquired for continuous real-time access. We have to address great challenges on the accuracy and online ability of data stream trend and correlation analysis processing due to limited computational and/or storage resources. And the processing time delay has also become a sharp bottleneck problem to restrict the data stream mining. This thesis focuses on the parallel computing models and algorithms for the trend and correlation analysis on data streams. These models and algorithms are capable of efficiently working on both CPU (Central Process Unit) and GPU (Graphic Process Unit) of high performance. The main research contents are summarized as follows:
     Firstly, we present a new online analysis method derived from the classical Hilbert-Huang Transform (HHT) in order to process the nonlinear and non-stationary time series data streams. This method combines the neural networks with radial basis functions (RBF) for the online prediction on the streams. We design a chain-style sliding window which can be rewritten to read and write the time-series data stream. Moreover, it divides the whole data into several segments to use CPU multi-threaded parallel processing for the prediction in a parallel fashion, and then glues the segments to a final stream. The online HHT method does not only render adaptive time-frequency analysis capabilities, but also accelerates computing speed. The partitioned results given by the method also reduce the input complexity of the RBF neural networks. Compared with the existing methods, the proposed method is able to handle online short-term trend prediction of the time series data stream.
     Secondly, we propose a new genetic algorithm with nested sliding windows (NSWGA) to replace the complicated pattern trees widely used in frequent item mining of data stream. This improved genetic algorithm uses nested sliding windows to segment data streams, and leverage the MPI parallel processing so as to effciently discover all frequent patterns for the nested windows. It can achieve incremental maintenance of frequent item sets through the updating of new data and removing of expired data. It also makes it possible for high efficiency processing in limited storage buffer space.
     Thirdly, we build a GPU-based generic process framework for data streams to tackle the processing delay and efficiency issues. This framework adapts to the characteristics of data streams and meets the high-performance requirements. We construct the parallel computing architecture of stream blocks with two granularity levels (big and small) by using SIMT mode of GPU and basic window model in sliding window. The big granularity parallelism is responsible for the parallel control of divided tasks, while the small granularity parallelism is grouped by computing thread grid and responsible for extract the synopsis data for various parallel mining algorithms. Both of them aim to achieve high efficiency of data exchange and performance parallel algorithm. Furthermore, we give a new parallel data quantile computing method named GSQ (GPU Stream Quantiles) in this generic framework. It can call GPU kernel to generate synopsis data histograms by Hash functions and finally query data stream quantile. Experimental results show the significant improvements on processing bandwidth, response time and speedup.
     Fourthly, we address the issue of the constraints of memory resources and execution sequences for multiple data stream correlation analysis on CPU. We propose a four-layer sliding window frame for multiple data streams, which crosses the bus and collaborates between CPU and GPU. Thus, parallel computing of basic window offsets can be processed when multiple data streams are completely mapped to the GPU memory space and created SID index for each. Then, we construct correlation parallel algorithms GSSCCA (GPU Single-Dimensional Stream Canonical Correlation Analysis) by s→Thread and s→Block multi-level parallel computing. Experimental results show that the algorithm has high accuracy and faster computing speed.
     Fifthly, the high-dimensional data streams appear more complex constraints of resources and execution sequences than single-dimensional data stream in the calculation accuracy and performance. To address this issue, we present the high-dimensional data stream correlation analysis method GMSCCA(GPU Multi-Dimensional Stream Canonical Correlation Analysis) algorithm in basis of study of related mathematical model. This method can quickly and accurately complete the calculation in the environment of limited computing resources and high-efficiency requirements by using data cube pattern and dimensionality-reduction technique. It also can give balanced compromise between high performance and approximation accuracy.

引文

[1]Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C]. ACM,2002.
    [2]Domingos P, Hulten G. Mining high-speed data streams[C]. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,2000.
    [3]姜奇平.大数据时代到来[J].互联网周刊,2012(02)：6.
    [4]陈如明.大数据时代的挑战、价值与应对策略[J].移动通信,2012(17)：14-15.
    [5]覃雄派,王会举,杜小勇,等.大数据分析—-RDBMS与MapReduce的竞争与共生[J].软件学报,2012,23(01)：32-45.
    [6]金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8)：1172-1181.
    [7]孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1)：1-5.
    [8]Henzinger M R, Raghavan P, Rajagopalan S. Computing on data streams[C]. External Memory Algorithms:Dimacs Workshop External Memory and Visualization,1999.
    [9]Papadimitriou S, Sun J, Faloutsos C. Streaming pattern discovery in multiple time-series[C]. Trondheim, Norway:VLDB Endowment,2005.
    [10]蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5)：1130-1132.
    [11]Cai Y D, Clutter D, Pape G, et al. MAIDS:Mining alarming incidents from data streams[C]. Proceedings of the 2004 ACM SIGMOD international conference on Management of data,2004.
    [12]谭博阅,刘宁.数据流中的近似查询技术[J].世界科技研究与发展,2006,28(002)：57-60.
    [13]Cohen S, Nutt W, Serebrenik A. Algorithms for rewriting aggregate queries using views[M]//Current Issues in Databases and Information Systems. Springer Berlin Heidelberg,2000:65-78.
    [14]Abadi D J, Carney D, Cetintemel U, et al. Aurora:a new model and architecture for data stream management[J]. The VLDB Journal.2003,12(2):120-139.
    [15]Chandrasekaran S, Cooper O, Deshpande A, et al. TelegraphCQ:continuous dataflow processing[C]. Proceedings of the 2003 ACM SIGMOD international conference on Management of data,2003.
    [16]Cranor C, Johnson T, Spataschek O, et al. Gigascope:A stream database for network applications[C]. Proceedings of the 2003 ACM SIGMOD international conference on Management of data,2003.
    [17]Cormode G, Muthukrishnan S. What's hot and what's not:tracking most frequent items dynamically[C]. Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,2003.
    [18]Giannella C, Han J, Pei J, et al. Mining frequent patterns in data streams at multiple time granularities[M]//Next generation data mining, AAAI/MIT,2003:191-212.
    [19]Jin C, Qian W, Sha C, et al. Dynamically maintaining frequent items over a data stream[C]. Proceedings of the twelfth international conference on Information and knowledge management, 2003.
    [20]Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams[M]//Automata, Languages and Programming. Springer Berlin Heidelberg,2002:693-703.
    [21]Cormode G, Muthukrishnan S. An improved data stream summary:The count-min sketch and its applications[J].Journal of Algorithms,2005,55(1):58-75.
    [22]Manku G S, Motwani R. Approximate frequency counts over data streams[C]. VLDB Endowment, 2002.
    [23]Joong-Hyuk C, Won-Suk L. A Sliding Window Method for Finding Recent Frequent Itemsets over Data Stream[J]. Journal of Information Science and Engineering,2004,20(4):753-762.
    [24]Yu J X, Chong Z, Lu H, et al. A false negative approach to mining frequent itemsets from high speed transactional data streams[J]. Information Sciences.2006,176(14):1986-2015.
    [25]Zaki M J, Hsiao C J. CHARM:An efficient algorithm for closed association rule mining[C].2nd SIAM International Conf. on Data Mining. Citeseer,1999.
    [26]Pei J, Han J, Mao R. CLOSET:An efficient algorithm for mining frequent closed itemsets[C]. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery,2000.
    [27]Chi Y, Wang H, Yu P S, et al. Moment:Maintaining closed frequent itemsets over a stream sliding window[C]. Data Mining,2004. ICDM'04. Fourth IEEE International Conference on, IEEE,2004.
    [28]Estan C, Varghese G. New directions in traffic measurement and accounting[C]. ACM SIGCOMM Computer Communication Review, ACM,2002.
    [29]张昕,李晓光,王大玲,等.数据流中一种快速启发式频繁模式挖掘方法[J].软件学报,2005,16(12)：2099-2105.
    [30]周傲英,崇志宏.数据流中基于计数的频繁模式挖掘[J].计算机应用,2004,24(10)：4-6.
    [31]刘学军,徐宏炳,董逸生,等.挖掘数据流中的频繁模式[J].计算机研究与发展,2005,42(12)：2192-2198.
    [32]刘学军,徐宏炳,董逸生,等.基于滑动窗口的数据流闭合频繁模式的挖掘[J].计算机研究与发展,2006,43(10)：1738-1743.
    [33]刘旭,毛国君,孙岳,等.数据流中频繁闭项集的近似挖掘算法[J].电子学报,2007,35(5)：900-905.
    [34]刘君强.海量数据挖掘技术研究[D].杭州：浙江大学,2003.
    [35]黄崇争,李海峰,陈红.数据流上近似非可导项集的挖掘算法[J].计算机学报,2010,33(08)：1427-1436.
    [36]Hulten G, Spencer L, Domingos P. Mining time-changing data streams[C]. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2001.
    [37]Khan M, Ding Q, Perrizo W. k-nearest neighbor classification on spatial data streams using P-trees [M]//Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg,2002: 517-528.
    [38]Wang H, Fan W, Yu P S, et al. Mining concept-drifting data streams using ensemble classifiers[C]. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003.
    [39]Last M. Online classification of nonstationary data streams[J]. Intelligent Data Analysis.2002,6(2): 129-147.
    [40]Chi Y, Wang H, Yu P S. Loadstar:load shedding in data stream mining[C]. VLDB Endowment, 2005.
    [41]Aggarwal C C, Han J, Wang J, et al. A framework for on-demand classification of evolving data streams[J]. IEEE Transactions on Knowledge and Data Engineering.2006,18(5):577-589.
    [42]Charikar M, Chekuri C, Feder T, et al. Incremental clustering and dynamic information retrieval[C]. Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, ACM,1997.
    [43]Guha S, Mishra N, Motwani R, et al. Clustering data streams[C]. Foundations of Computer Science, 2000. Proceedings.41st Annual Symposium on, IEEE,2000.
    [44]Muthukrishnan S. Data streams:Algorithms and applications[M]. Now Publishers Inc,2005.
    [45]O'Callaghan L, Mishra N, Meyerson A, et al. Streaming-data algorithms for high-quality clustering[C]. Proceedings 18th International Conference on Data Engineering, IEEE,2002.
    [46]聂亚可.序列挖掘及其在证券分析中的应用[D].重庆：重庆大学,2001.
    [47]Charikar M, O'Callaghan L, Panigrahy R. Better streaming algorithms for clustering problems[C]. Proceedings of the thirty-fifth annual ACM symposium on Theory of computing,2003.
    [48]Ordonez C. Clustering binary data streams with K-means[C]. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery,2003.
    [49]张晨,金澈清,周傲英.一种不确定数据流聚类算法[J].软件学报,2010,21(09)：2173-2182.
    [50]杨宁,唐常杰,王悦,等.一种基于时态密度的倾斜分布数据流聚类算法[J].软件学报,2010,21(05)：1031-1041.
    [51]陈华辉,施伯乐,钱江波,等.基于小波概要的并行数据流聚类[J].软件学报,2010,21(04)：644-658.
    [52]Chen Y, Dong G, Han J, et al. Multi-dimensional regression analysis of time-series data streams[C]. VLDB Endowment,2002.
    [53]孙易木.异常点挖掘及证券行业应用实例研究[D].上海：同济大学,2006.
    [54]Portnoy L, Eskin E, Stolfo S. Intrusion detection with unlabeled data using clustering[C]. Citeseer, 2001.
    [55]Dong G, Han J, Lakshmanan L V S, et al. Online mining of changes from data streams:Research problems and preliminary results[C]. Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams,2003.
    [56]Kifer D, Ben-David S, Gehrke J. Detecting change in data streams[C]. VLDB Endowment,2004.
    [57]Zhu Y, Shasha D. Statstream:Statistical monitoring of thousands of data streams in real time[C]. VLDB Endowment,2002.
    [58]Elfeky M G, Aref W G, Elmagarmid A K. Periodicity detection in time series databases[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17(7):875-887.
    [59]宋国杰,唐世渭,杨冬青,等.数据流中异常模式的提取与趋势监测[J].计算机研究与发展.2004,41(10)：1754-1759.
    [60]Sakurai Y, Papadimitriou S, Faloutsos C. Braid:Stream mining through group lag correlations[C]. Proceedings of the 2005 ACM SIGMOD international conference on Management of data,2005.
    [61]Yang J. Dynamic clustering of evolving streams with a single pass[C]. Bangalore, India:Institute of Electrical and Electronics Engineers Computer Society,2003.
    [62]Boduo Li E M Y D. A platform for scalable one-pass analytics using MapReduce[M]. Athens, Greece:Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011:985-996.
    [63]Sudipto Das S A D A. Thread cooperation in multicore architectures for frequency counting over multiple data streams[J]. Proceedings of the VLDB Endowment.2009,2(1):217-228.
    [64]亓开元,赵卓峰,房俊,等.针对高速数据流的大规模数据实时处理方法[J].计算机学报,2012,35(03)：477-490.
    [65]宫学庆,金澈清,王晓玲,等.数据密集型科学与工程：需求和挑战[J].计算机学报,2012,35(08)：1563-1578.
    [66]Dotsenko Y, Govindaraju N K, Sloan P P, et al. Fast scan algorithms on graphics processors[C]. Proceedings of the 22nd annual international conference on Supercomputing,2008.
    [67]Govindaraju N K, Raghuvanshi N, Manocha D. Fast and approximate stream mining of quantiles and frequencies using graphics processors[C]. Proceedings of the 2005 ACM SIGMOD international conference on Management of data,2005.
    [68]Wu R, Zhang B, Hsu M. Clustering billions of data points using GPUs[C]. Proceedings of the combined workshops on Unconventional high performance computing workshop plus memory access workshop,2009.
    [69]Bustamam A, Burrage K, Hamilton N A. A GPU implementation of fast parallel Markov clustering in bioinformatics using ELLPACK-R sparse data format[C]. Jakarta, Indonesia:2010.
    [70]曹锋,周傲英.基于图形处理器的数据流快速聚类[J].软件学报,2007,18(2)：291-302.
    [71]Fang W, Lau K K, Lu M, et al. Parallel data mining on graphics processors[J]. Hong Kong University of Science and Technology, Tech. Rep. HKUST-CS08-07.2008.
    [72]周勇,王皓,程春田,等.基于GPU的多数据流相关系数并行计算方法研究[J].计算机应用研究,2010,27(4)：1232-1235.
    [73]Li H. A GPU-based closed frequent itemsets mining algorithm over stream[C]. Intelligent Computing and Intelligent Systems (ICIS),2010 IEEE International Conference on. IEEE,2010.
    [74]Li H. A GPU-based maximal frequent itemsets mining algorithm over stream[C]. Chengdu, China: 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering,2010.
    [75]Chen G, Li G, Wu B, et al. A GPU-based computing framework for CSCW[C]. Shanghai, China:The 14th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2010), 2010.
    [76]Gembris D, Neeb M, Gipp M, et al. Correlation analysis on GPU systems using NVIDIA'S CUDA[J]. Journal of Real-Time Image Processing,2011,6(4):275-280.
    [77]Kelmelis E J, Ortiz F E, Curt P F, et al. Comparing FPGAs and GPUs for high-performance image processing applications[C]. SPIE Defense, Security, and Sensing. International Society for Optics and Photonics,2010.
    [78]Tarditi D, Puri S, Oglesby J. Accelerator:Using data parallelism to program GPUs for general-purpose uses[C]//ACM SIGARCH Computer Architecture News. ACM,2006,34(5): 325-335.
    [79]Cazalas J, Guha R. GEDS:GPU execution of continuous queries on spatio-temporal data streams[C]//Embedded and Ubiquitous Computing (EUC),2010 IEEE/IFIP 8th International Conference on. IEEE,2010:112-119.
    [80]Zhang Y, Mueller F. GStream:A general-purpose data streaming framework on GPU clusters[C]//Parallel Processing (ICPP),2011 International Conference on. IEEE,2011:245-254.
    [81]卢风顺,宋君强,银福康,等CPU/GPU协同并行计算研究综述[J].计算机科学,2011,38(03)：5-9.
    [82]Zhu Y, Shasha D. Efficient elastic burst detection in data streams[C]. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003.
    [83]Marascu A, Masseglia F. Mining sequential patterns from temporal streaming data[C]. Proceedings of the 1st ECML/PKDD Workshop on Mining Complex Data (IEEE MCD),2005.
    [84]Ikonomovska E, Loskovska S, Gjorgjevik D. A survey of stream data mining[C]. Proceedings of 8th National Conference with International Participation, ETAI,2007.
    [85]Buck I, Foley T, Horn D, et al. Brook for GPUs:stream computing on graphics hardware[C]//ACM Transactions on Graphics (TOG). ACM,2004,23(3):777-786.
    [86]He B, Yang K, Fang R, et al. Relational joins on graphics processors[C]. Proceedings of the 2008 ACM SIGMOD international conference on Management of data,2008.
    [87]Gibbons P B, Matias Y, Poosala V. Fast incremental maintenance of approximate histograms[C]. Institute Of Electrical & Electronics Engineers (IEEE),1997.
    [88]Poosala V, Haas P J, Ioannidis Y E, et al. Improved histograms for selectivity estimation of range predicates[J]. ACM SIGMOD Record,1996,25(2):294-305.
    [89]Guha S, Koudas N, Shim K. Data-streams and histograms[C]. Proceedings of the thirty-third annual ACM symposium on Theory of computing,2001.
    [90]Govindaraju N K, Raghuvanshi N, Manocha D. Fast and approximate stream mining of quantiles and frequencies using graphics processors[C]. Proceedings of the 2005 ACM SIGMOD international conference on Management of data,2005.
    [91]Vitter J S. Random sampling with a reservoir[J]. ACM Transactions on Mathematical Software (TOMS),1985,11(1):37-57.
    [92]杨蓓,黄厚宽.数据流上的分位数近似算法研究[J].计算机研究与发展,2008,45(2)：287-292.
    [93]Datar M, Gionis A, Indyk P, et al. Maintaining stream statistics over sliding windows[C]. Society for Industrial and Applied Mathematics,2002.
    [94]Golab L, Garg S, ozsu M T. On indexing sliding windows over online data streams[M]//Advances in Database Technology-EDBT 2004. Springer Berlin Heidelberg,2004:712-729.
    [95]Han J, Kamber M, Pei J. Data mining:concepts and techniques[M]. Morgan Kaufmann,2011.
    [96]朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3)：379-387.
    [97]马瑞民,王小龙NBCC：一种数据流上变化的挖掘算法[J].计算机工程与应用,2006,42(7)：166-168.
    [98]刘耀宗,王湛,张宏,等.数据流的预测与分类研究[J].计算机科学,2007,34(11)：170-173.
    [99]李建中,郭龙江,张冬冬,等.数据流上的预测聚集查询处理算法[J].软件学报,2005,16(7)： 1252-1261.
    [100]Guha S, Meyerson A, Mishra N, et al. Clustering data streams:Theory and practice[J]. Knowledge and Data Engineering, IEEE Transactions on,2003,15(3):515-528.
    [101]Aggarwal C C, Han J, Wang J, et al. A framework for clustering evolving data streams[C]. VLDB Endowment,2003.
    [102]Aggarwal C C, Han J, Wang J, et al. A framework for projected clustering of high dimensional data streams[C]. VLDB Endowment,2004.
    [103]Guha S, Gunopulos D, Koudas N. Correlating synchronous and asynchronous data streams[C]. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003.
    [104]屠莉,陈崚,邹凌君.基于相关分析的多数据流聚类(英文)[J].软件学报,2009,20(7)：1756-1767.
    [105]杨学军.并行计算六十年[J].计算机工程与科学,2012(08)：1-10.
    [106]都志辉.高性能计算并行编程技术-MPI并行程序设计[M].北京：清华大学出版社,2001.
    [107]陈国良.并行计算：结构·算法·编程[M].北京：高等教育出版社,2011.
    [108]Quinn Michael J. MPI与OpenMP并行程序设计[M].北京：清华大学出版社,2004.
    [109]孙立鹏,方宁,渠慎丰.基于OpenMP的射线追踪并行计算方法[J].电子测量技术.2012(01)：50-54.
    [110]王磊.并行计算技术综述[J].信息技术.2012(10)：112-115.
    [111]陈国良.并行算法的设计与分析[M].北京：高等教育出版社,2009.
    [112]吴恩华.图形处理器用于通用计算的技术,现状及其挑战[J].软件学报,2004,15(10)：1493-1504.
    [113]张舒,褚艳利,赵开勇,等.GPU高效能运算之CUDA[M].北京：中国水利水电出版社,2009.
    [114]Nvidia. NVIDIA CUDA Programming Guide[M/OL]. Programming Guide 4.0,2012. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
    [115]Govindaraju N K, Lloyd B, Wang W, et al. Fast computation of database operations using graphics processors[C]. Proceedings of the 2004 ACM SIGMOD international conference on Management of data,2004.
    [116]NVIDIA Tesla 20 Series, Tesla C2050/C2070 Board Specification[R]. Nvidia,2009.
    [117]High Performance Computing[R]. Nvidia,2011.
    [118]张朝晖,刘俊起,徐勤建.GPU并行计算技术分析与应用[J].信息技术,2009(11)：86-89.
    [119]宋国杰,唐世渭,杨冬青,等.数据流中异常模式的提取与趋势监测[J].计算机研究与发展,2004,41(10)：1754-1759.
    [120]Jain A, Chang E Y, Wang Y F. Adaptive stream resource management using kalman filters[C] //Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 2004:11-22.
    [121]Faloutsos C. Stream and Sensor data mining[C]. Tutorial for the 9 th International Conference on Extending Database Technology (EDBT 2004),2004.
    [122]徐科,徐金梧,班晓娟.基于小波分解的某些非平稳时间序列预测方法[J].电子学报,2001,29(4)：566-568.
    [123]贺国光,马寿峰.基于小波分解与重构的时间序列预测法[J].自动化学报,2002,28(6)：1012-1014.
    [124]王永利,周景华,徐宏炳,等.时间序列数据流的自适应预测[J].自动化学报,2007,32(2)：197-201.
    [125]Trudnowski D J, Mcreynolds W L, Johnson J M. Real-time very short-term load prediction for power-system automatic generation control [J]. Control Systems Technology, IEEE Transactions on. 2001,9(2):254-260.
    [126]Bulut A, Singh A. Stardust:Fast stream indexing using incremental wavelet approximations[C]. 2003.
    [127]周勇,王尤慧,程春田. Online-HHT方法在时间序列数据流预测中的应用[J].计算机工程与应用,2011,47(03)：142-145.
    [128]周勇,李念水,程春田.基于经验模式分解的时间序列数据流在线预测[J].计算机应用研究,2011,28(02)：508-510.
    [129]李俊奎,王元珍.可重写循环滑动窗口：面向高效的在线数据流处理[J].计算机科学,2007,34(12)：51-55.
    [130]李道伦,卢德唐,孔祥言.基于径向基函数网络的隐式曲线[J].计算机研究与发展,2005,42(4)：599-603.
    [131]Hong T P, Lin C W, Wu Y L. Maintenance of fast updated frequent pattern trees for record deletion[J]. Computational Statistics & Data Analysis,2009,53(7):2485-2499.
    [132]谈克林,孙志挥.一种FP树的并行挖掘算法[J].计算机工程与应用,2006,42(13)：155-157.
    [133]冯志新,钟诚.基于FP-tree的最大频繁模式挖掘算法[J].计算机工程,2004,30(11)：123-124.
    [134]Leung C K S, Khan Q I. DSTree:a tree structure for the mining of frequent sets from data streams[C]//Data Mining,2006. ICDM'06. Sixth International Conference on. IEEE,2006:928-932.
    [135]Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams[C] //Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2003:487-492.
    [136]李国徽,陈辉.挖掘数据流任意滑动时间窗口内频繁模式[J].软件学报,2008,19(10)：2585-2596.
    [137]周勇,韩君,程春田.滑动窗口中近期数据流频繁项集挖掘[J].计算机工程与设计,2011,32(04)：1307-1310.
    [138]Tanbeer S K, Ahmed C F, Jeong B S, et al. Sliding window-based frequent pattern mining over data streams[J]. Information Sciences,2009,179(22):3843-3865.
    [139]雷英杰,张善文,李续武MATLAB遗传算法工具箱及应用[M].西安电子科技大学出版社,2005.
    [140]周勇,王皓,程春田.使用GPU技术的数据流分位数并行计算方法[J].计算机应用,2010,30(02)：543-546.
    [141]刘伟峰,王智广.细粒度并行计算编程模型研究[J].微电子学与计算机,2008(10)：103-106.
    [142]金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8)：1172-1181.
    [143]Zhu Y, Shasha D. Statstream:Statistical monitoring of thousands of data streams in real time[C]//Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment,2002:358-369.
    [144]Babcock B, Babu S, Datar M, et al. Models and issues in data stream systems[C]//Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM,2002:1-16.
    [145]Govindaraju N K, Raghuvanshi N, Manocha D. Fast and approximate stream mining of quantiles and frequencies using graphics processors[C]//Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM,2005:611-622.
    [146]杨蓓,黄厚宽.数据流上的分位数近似算法研究[J].计算机研究与发展,2008,45(2)：287-292.
    [147]Shams R, Barnes N. Speeding up mutual information computation using NVIDIA CUDA hardware[C]//Digital Image Computing Techniques and Applications,9th Biennial Conference of the Australian Pattern Recognition Society on. IEEE,2007:555-560.
    [148]彭宏,刘洋,邓维维,等.股票数据流的相关性计算方法[J].华南理工大学学报(自然科学版),2006(1)：86-89.
    [149]周勇,王皓,程春田,等.基于GPU的多数据流相关系数并行计算方法研究[J].计算机应用研究,2010,27(04)：1232-1235.
    [150]Golab L, Garg S, ozsu M T. On indexing sliding windows over online data streams[M]//Advances in Database Technology-EDBT 2004. Springer Berlin Heidelberg,2004:712-729.
    [151]Gama J, Gaber M M. Learning from data streams:processing techniques in sensor networks[M]. Springer-Verlag New York Inc,2007.
    [152]Dotsenko Y, Govindaraju N K, Sloan P P, et al. Fast scan algorithms on graphics processors[C] //Proceedings of the 22nd annual international conference on Supercomputing. ACM,2008: 205-213.
    [153]Sakurai Y, Papadimitriou S, Faloutsos C. BRAID:stream mining through group lag correlations[C] //Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005:599-610.
    [154]Zhu Y, Shasha D. Statstream:Statistical monitoring of thousands of data streams in real time[C] //Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 2002:358-369.
    [155]Guha S, Gunopulos D, Koudas N. Correlating synchronous and asynchronous data streams[C] //Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM,2003:529-534.
    [156]Dai B, Huang J, Yeh M, et al. Clustering on Demand for Multiple Data Streams[C//Data Mining, 2004. ICDM'04. Fourth IEEE International Conference on. IEEE,2004:367-370.
    [157]Babcock B, Babu S, Datar M, et al. Operator scheduling in data stream systems[J]. The VLDB Journal-The International Journal on Very Large Data Bases,2004,13(4):333-353.
    [158]Wu E. State of the art and future challenge on general purpose computation by graphics processing unit[J]. Ruan Jian Xue Bao/Journal of Software,2004,15(10):1493-1504.
    [159]王永利,徐宏炳,董逸生,等.基于低阶近似的多维数据流相关性分析[J].电子学报,2006,34(2)：293-300.
    [160]Bulut A, Singh A K. A unified framework for monitoring data streams in real time[C]. Data Engineering,2005. ICDE 2005. Proceedings.21st International Conference on,2005.
    [161]Sayal M. Detecting time correlations in time-series data streams[M]. Hewlett-Packard Company, 2004.
    [162]Yi B K, Sidiropoulos N D, Johnson T, et al. Online data mining for co-evolving time sequences[C]. Proceedings 16th International Conference on Data Engineering, IEEE,2000.
    [163]王磊,张春燕.基于图形处理器的通用计算模式[J].计算机应用研究,2009,26(6)：2356-2358.
    [164]苏畅,付忠良,谭雨辰.一种在GPU上高精度大型矩阵快速运算的实现[J].计算机应用,2009,29(4)：1177-1179.
    [165]李建明,迟忠先,万单领.一种基于GPU加速细粒度并行遗传算法的实现方法[J].控制与决策,2008,23(6)：697-700.
    [166]周勇,卢晓伟,程春田.非规则流中高维数据流典型相关性分析并行计算方法[J].软件学报,2012,23(05)：1053-1072.
    [167]伍楠.高效能流体系结构关键技术研究[D].国防科学技术大学,2008.
    [168]文梅,李海燕,伍楠,等.流体系结构抽象模型研究[J].计算机工程与科学,2006(7)：123-126.
    [169]杨雪梅,董逸生,徐宏炳,等.高维数据流的在线相关性分析[J].计算机研究与发展,2006,43(10)：1744-1750.
    [170]杨明,刘先忠.矩阵论(研究生教学用书)[M].华中科技大学出版社,2005.
    [171]Johnson W B, Lindenstrauss J. Extensions of Lipschitz mappings into a Hilbert space[J]. Contemporary mathematics,1984,26(1):189-206.