时间序列与聚类挖掘相关技术研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

时间序列与聚类挖掘相关技术研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Research on Time Series and Cluster Analysis
作者：刘兵
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：数据挖掘 ; 时间序列 ; 相似性查找 ; 小波变换 ; 度量空间 ; 聚类
英文关键词：Data Mining ; Time Series ; Similarity Search ; Wavelet Transform ; Metric Space ; Clustering
学位年度：2006
导师：施伯乐
学科代码：081202
学位授予单位：复旦大学
论文提交日期：2006-04-10

摘要

数据挖掘及其应用已经渗透到多个学科，并在人工智能与机器学习、数据库、模式识别、生物信息学、神经计算等领域取得了丰硕的成果。同时，数据挖掘也不仅是科学家的兴趣所在，更多地得到了政府、工业界的密切关注。通过引入数据挖掘，可以大大提高生产力，取得社会的更大进步。世界上许多国家和地区的政府及工业界都希望掌握数据挖掘技术，提升国家和企业的科技含量，并最终取得领先的地位。
     数据挖掘涉及的研究范围较为广泛，本文主要讨论了序列数据与聚类挖掘相关技术，主要的主要的研究成果如下：
     (1)给出小波变换在时间序列相似性查找中对距离上下界的一个严格估计，同时说明传统的算法只是本文下界的一部分。根据本文给出的小波变换的下界，相对于传统的算法，可以排除更多的不相似序列。根据给出的上界，可以直接判断出两条序列是否相似，进一步减少需要验证的原始序列的个数。
     (2)在使用小波变换缩减维度解决高维时间序列查询时，传统的算法均使用变换后小波序列的前k个系数作为原始时间序列的一个近似估计。但是由于选择前k个系数不一定能很好地近似原始序列集合，可能对于中间某些系数的选取，可以更好的给出原始序列集合的一个表示。因此给出相关定理，说明选择小波系数集合的列平方和最大的k列，可以更好近似原始序列集合。
     (3)对允许时间偏移的序列间相似性搜索，由于可以处理异常数据以及允许不同长度的时间序列间进行匹配，因此应用日益广泛。但是大部分研究都是基于两条时间序列间的全相似性匹配。给出了基于动态规划的子序列相似性搜索算法，对于给定的查询序列，可以搜索到长序列中和给定的查询序列最为相似的一段子序列。并进一步给出了两种优化算法，以减少子序列相似性搜索中距离矩阵需要计算的项的个数。
     (4)时间序列的相似性搜索可以看成度量空间搜索的一种特例。提出一种新的度量空间索引数据结构，简称为bu-tree，它是基于自底向上的分层聚类来构造索引结构，而传统的度量空间数据结构大部分是基于自顶向下构造的方法。相对于传统的构造方法，bu-tree可以在更小的索引半径内包含更多的对象，这样有利于查询的筛选。给出了bu-tree的构造算法以及相应的范围查询算法。
     (5)数据概要被用来压缩大规模的数据库，以便进行后续的分层聚类分析。bu-tree中每个节点也可以看成是一种数据概要。讨论了另一种常用的数据概要：数据泡。详细研究了递增数据泡的质量度量标准(数据概要指标)。当更新数据库时，我们指出哪些因素会影响数据概要指标的期望与均方差。基于这些因素，给
    出一个对数据泡进行递增维护的一个动态算法。
     (6)讨论了系统级故障诊断中对测试序列的聚类分析算法。在基于聚类的集团理论的基础上，利用贪婪算法中不同贪婪准则提出了四个针对系统及故障的概率诊断算法。每种算法在较少的测试数情况下，均表现出较高的诊断正确率，且时间复杂度不高。
Recently, data mining and its applications have already come into many disciplines and achieved plentiful fruits in diversified fields, including artificial intelligence and machine learning, database, pattern recognition, bioinformatics, neural computing, and so on. It not only appeals scientists but also catches the attention from governments and industries. The governments, industrial communities, and academic fields are so keen on mastering data mining techniques that they have invested a large deal of money and energy on the corresponding research. Therefore, the progress of data mining will promote the development of science and society.
    Data mining have many research fields. In this dissertation, we focus on time series and cluster technology. The main works and contributions of the dissertation are summarized as follows:
    (1) Wavelet transforms are used as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time series data. This dissertation proposes the tight upper and lower bounds on the estimation distance using wavelet transform, and we show that the traditional distance estimation is only part of our lower bound. According to the lower bound, we can exclude more dissimilar time series than traditional method. And according to the upper bound, we can directly judge whether two time series are similar, and further reduce the number of time series to process in original time domain. The experiments have shown that using the upper and lower tight bounds can significantly improve filter efficiency and reduce running time than traditional method.
    (2) For wavelet transform used in dimensional reduction, the traditional algorithms use the first k wavelet coefficients as an approximation of the original time series data set. But it is possible that choosing first k coefficients is not optimal, and perhaps choosing other k coefficients is better than the first k. A new method is proposed to better approximating the original time series data set. The main idea of the method is to choose the same k columns of the wavelet time series data set which have the maximum square sum. The experiments have shown that our method can reduce the relative error compared to traditional algorithms.
    (3) Similarity search for time series under dynamic time shifting is prevailing. But most recent research focused on the full length similarity match of two time series.
    A new basic subsequence similarity search algorithm based on dynamic programming is proposed. For a given query time series, the algorithm can find out the most similar subsequence in a long time series. Furthermore two improved algorithms are also given. They can reduce the computation amount of the distance matrix for subsequence similarity search. Experiments on real and synthetic data sets show that the improved algorithms can significantly reduce the computation amount and running time compared to the basic algorithm.
    (4) Similarity search in many new database applications can generally be referred as similarity search in metric space. Time series can be seen as one application in metric space. A new index construction algorithm is proposed for similarity search in metric space. The new data structure, called bu-tree (bottom-up tree), is based on constructing the index tree from bottom-up, rather than the traditional top-down approaches. The construction algorithm of bu-tree and the range search algorithm based on it are given. And the update to bu-tree is also discussed. The experiments show that bu-tree is better than sa-tree in search efficiency, especially when the objects are not uniform distributed or the query has low selectivity.
    (5) In many real world applications, with the databases frequent insertions and deletions, the ability of a data mining technique to detect and react quickly to dynamic changes in the data distribution and clustering over time is highly desired. Data summarizations (e.g., data bubbles) have been proposed to compress large databases into representative points suitable for subsequent hierarchical cluster analysis. We thoroughly investigate the quality measure (data summarization index) of incremental data bubbles. When updating databases, we show which factors could affect the mean and standard deviation of data summarization index or not. Based on these statements, a fully dynamic scheme to maintain data bubbles incrementally is proposed. An extensive experimental evaluation confirms our statements and shows that the fully dynamic incremental data bubbles are effective in preserving the quality of the data summarization for hierarchical clustering.
    (6) Probabilistic diagnosis algorithm is very important in the system level fault diagnosis research. Based on cluster theory, we propose four probabilistic algorithms using greedy algorithm for system-level fault diagnosis. By computer simulation, it is shown that these diagnosis algorithms can achieve a high probability of correct under low time complexity. The greedy algorithm one has the best performance in the four
    probabilistic algorithms. The results also indicate that our algorithms have better performance than the Compete algorithm and Majority algorithm, which are classic probabilistic algorithms in system level fault diagnosis.

引文

[AAR96] A. Airning, R. Agrawal, P. Ragaran. A Linear Method for Deviation Detection in Large Databases. In: Proc. 2~(nd) Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR: AAAI Press, pp 164-169, 1996.
    [ABK+99] M. Ankerst, M. Breunig, H.P Kriegel, J. Sander. OPTICS: Ordering Points to Identify Clustering Structure. In Proc. Of the ACM SIGMOD Conference, pp.49-60, Philadelphia, PA, 1999.
    [AFS93] R. Agrawal, C. Faloutsos, A. Swami. Efficient similarity search in sequence databases. In Procs. Of the Fourth International Conference on Foundations of Data Organization and Algorithms, 1993.
    [AGG+98] R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proc. Of the ACM SIGMOD Conference, pp.94-105, 1998.
    [AGM+90] Stephen F. Altschui, Warren Gish, Webb Miller, etc. Basic Local Alignment Search Tool. J. Mol. Biol. (1990) 215,403-410
    [AIS93] R. Agrawal, T. Imielinski, A. Swami. Mining Association Rules Between Sets of Items in Large Databases. In Proc. Of the ACM-SIGMOD Int. of Conf. on Management of Data. Washington D.C.,pp.207-216, 1993.
    [AS94] R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules. In Proc. Of the 20th VLDB Conference. Santiago Chile, pp.487-499, 1994.
    [ASK+03] J.Alon, S. Sclaro, G Kollios, V. Pavlovic. Discovering clusters in motion time-series data. In IEEE Computer Vision and Pattern Recognition Conference (CVPR), 2003.
    [BB92] R. Bianchini and R. Buskens. Implementation of On-Line Distributed System-Level Diagnosis Theory [J] . IEEE Trans. Computers, 1992, 41(5): 616-626.
    [BB96] M Bearden, R Bianchini. Efficient and fault-tolerant distributed host monitoring using system-level diagnosisv[A]. Proceedings of the IFIP/IEEE International Conference on Distributed Platforms: Client/Server and Beyond[C]. Dresden, Germany, 1996. 159-172

    [BB99] M Bearden, R Bianchini. Efficient and fault-tolerant distributed host monitoring using system-level diagnosis[A]. Proceedings of the IFIP/IEEE International Conference on Distributed Platforms: Client/Server and Beyond[C]. Dresden, Germany, 1996. 159-172
    [BC94] J.D Berndt and J. Clifford. Using Dynamic Time Warping to Find Patterns in Time Series. In Working Notes of the Knowledge Discovery in Databases Workshop, pp.359-370, 1994.
    [BC96] D. J. Berndt and J. Clifford. Finding patterns in time series: A dynamic programming approach. In Advances in Knowledge Discovery and Data Mining, pages 229-248,1996.
    [BGG97] C. Siney Burrus, R. A. Gopinath, H. Guo. Introduction to Wavelets and Wavelet Transforms, A Primer. Prentice Hall, 1997
    [BGR+99] K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft. When is Nearest Neighbors Meaningful? In ICDT Conference Preceedings, Jerusalem Israel, pp.217-235,1999.
    [BJZ03] A. J. Bagnall, G Janakec, M. Zhang. Clustering Time Series from Mixture Polynomial Models with Discretised Data. Technical Report CMP-C03-17, School of Computing Sciences, University of East Anglia, 2003
    [BKK+01] Breuing, M., Kriegel, H-P, Kroger, P., Sander, J. Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering. In SIGMOD'01, 79-90, 2001.
    [BKN+00] M. M. Breunig, H.P Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. In ACM SIGMOD Conference Proceedings, 2000.

    [BKS+90] N. Beckman, H. P. Kriegel, R. Schneider, B. Seeger. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. In Proc. ACM SIGMOD Conf., pp.322-331, Atlantic City, NJ, May 1990.
    [BL94] V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley and Sons, New York, 1994.
    [BMU97] S.Brin, R. Motwani, J.D Ullman, S. Tsur. Dynamic Itemset Counting and Implication Rules for Market Basket Data, Proc. ACM SIGMOD Int'1 Conf. Management of Data, ACM Press, New York, 1997, pp. 255-264.

    [Bri95] BRIN, S. Near neighbor search in large metric spaces. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB) (Zurich, Switzerland), U. Dayal, P. M. D. Gray, and S. Nishio, Eds., 1995. 574-584.
    [CCF+97] Charikar, M., Chekuri, C., Feder, T., Motwani, R. Incremental Clustering and Dynamic Information Retrieval. In 29th Symposium on Theory of Computing, 626-635,1997.
    [CF99] K. P. Chan and W. C. FU. Efficient Time Series Matching by Wavelets[C].Proceedings of the International Conference on Data Engineering. Washington: IEEE Computer Society.l999:126-133.
    [CFY03] K. P. Chan, A. Fu and C. Yu. Haar wavelets for efficient similarity search of time-series: with and without time warping [J]. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(3):686-705.
    [CGR+01] Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, Kyuseok Shim. Approximate Query Processing Using Wavelets. The VLDB Journal, 2001,10(3): 199-223
    [CHO02] Chen, C, Hwang, S., Oyang, Y. An Incremental Hierarchical Data Clustering Algorithm Based on Gravity Theory. In 6th Pacific Asia Conference on Knowledge Discovery and Data Mining, 2002.
    [CM99] K. Chakrabarti and S. Mehrotra. The Hybrid Tree: An index structure for high dimensional feature spaces. In Proc. of the ICDE Conf., pages 440-447, Sydney, Australia, 1999
    [CN04] Lei Chen and Raymond Ng. On the marriage of Lp-norms and edit distance. Proceedings of the 30th VLDB Conference, pages 792-803, Toronto, Cannada, 2004.
    [COO05] Lei Chen, Tamer Ozsu, Vincent Oria. Robust and Fast Similarity Search for Moving Object Trajectories. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 491-502,2005.
    [CPZ97] P. Ciaccia, M. Patella, P. Zezula. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In Proc. Of the 23th International Conference on Very large Data Bases (VLDB), Athens, August 1997.
    [CS01] Stefano Chessa, Paolo Santi. Comparison-Based System-level Fault Diagnosis in Ad-Hoc Networks [A]. Proceedings of the Symposium on Reliable and Distributed Systems[C]. New Orleans,2001:28-31
    [CW99] Kelvin Kam Wing Chu, Man Hon Wong. Fast time-series searching with scaling and shifting[C]. Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. New York: ACM Press. 1999:237-248
    [DLM+98] G Das, K. Lin, H. Mannila, G Renganathan, P. Smyth. Rule Discovery from Time Series.KDD 1998, pp: 16-22.
    [DN98] Elias procopio Duarte, Takashi Nanya. A hierarchical adaptive distributed system-level diagnosis algorithm [J]. IEEE Trans. Computers, 1998,47(1):34-45.
    [DR03] A. Deligiannakis, N. Roussopoulos. Extended wavelets for multiple measures. In Proc of SIGMOD 2003
    [EKS+96] M.Ester, H.P Kriegel, J.Sander, X. Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pp226-231, Porland, Oregon, August 1996.
    [EKS+98] Ester, M., Kriegel, H-P., Sander, J. Wimmer, M., Xu, X. Incremental Clustering for Mining in a Data Warehousing Enviornment. VLDB'98, 323-333, 1998.
    [FPS+96] U. M. Fayyad, G Piatetsky-Shaperio, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996
    [FRM94] C. Faloutsos, M. Ranganathan, Y. Manolopoulos. Fast subsequence matching in time-series databases. In Proc. of SIGMOD 1994
    [GAI00] M. Gavrilov, D. Anguelov, P. Indyk, R. Motwani. Mining the Stock Market: Which Measure is Best? In Proc. Of the KDD, pp.487-496, 2000.
    [GG02] Minos Garofalakis, Philip B. Gibbons. Wavelet Synopses with Error Guarantees. In ACM SIGMOD'02, Madison, Wisconsin, USA, pp476-487
    [GK04] Minos Garofalakis, Amit Kumar. Deterministic Wavelet Thresholding for Maximum-Error Metrics. In PODS'04, Paris, France. pp166-176
    [GRS98] S. Guha, R. Rastogi, K. Shim. CURE: An Efficient Clustering Algorithm for Large Databases. In Proceedings of the ACM SIGMOD Conference, Seattle, WA, pp73-84, 1998.
    [GRS99] S. Guha, R. Rastogi, K. Shim. ROCK: A Robust Clustering Algorithm for Categorical Attributes. In Proceedings of the 15th ICDE, Sydney, Australia, pp512-521,1999.
    [Gut84] A. Guttman. R-tree: A Dynamic Index Structure for Spatial Searching. Proc. ACM SIGMOD, pp.47-57, June 1984.
    [Haw80] D. Hawkins. Identification of Outliers. Chapman and Hall, London, 1980.
    [HCY00] Hsien-Sheng Hsiao, Yeh-hao chin, Wei-pang Yang. Reaching fault diagnosis agreement under a hybrid fault model[J]. IEEE Trans, 2000,COMP-49(9), 980-986
    [HK01] J.W Han, M. Kambr. Data Mining: Concepts and Techniques. Academic Press, 2001.
    [HK98] A. Hinneburg, D. Keim. An Efficient Approach to Clustering Large Multimedia Database with Noise.In Proc. of th 4th ACM SIGKDD, pp58-65, New York, NY, 1998.

    [HKT99] Yk Huhtala, Juha Krkkinen, and Hannu Toivonen. Mining for similarities in aligned time series using wavelets[C]. Proceeding of Data Mining and Knowledge Discovery: Theory, Tools, and Technology. Orlando, Florida. 1999:150-160.
    [HMS01] D.Hand, H. Mannila, P. Smyth. Principles of Data Mining, Massachusetts Institute of Technology, 2001.
    [HPY00] Han J., Pei J. and Yin Y., Mining Frequent Patterns without Candidate Generation, Proceedings of ACM SIGMOD the 2000 International Conference on Management of Data(SIGMOD'OO), May 2000.
    [HS03] Gisli R. Hjaltason, Hanan Samet, Index-driven similarity search in metric spaces, ACM TODS, Vol. 28, No. 4, December 2003. 517-580.
    [HWP04] Hua Han, Xueling Wang, Silong Peng. Image Restoration Based on Wavelet-Domain Local Gaussian Model. Journal of Software, 2004,15 (3):443-450
    [JC99] Gwangil Jeon, Yookun Cho. A system-Level diagnosis for Internet-based virtual private networks [A]. 29th International Symposium on Fault-Tolerant Computings. New York, 1999

    [JD88] A. Jain, R. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.

    [JKM99] H. V. Jagadish, N. Koudas, S. Muthukrishnan: Mining Deviants in a Time Series Database. In: Proceedings of VLDB 1999, pp.102-113
    [JS01] J. Kohlmorgen, S. Lemm. A Dynamic HMM for On-line Segmentation of Sequential Data. Proceedings of NIPS, 2001,14:793-800.
    [KCH01] E. Keogh, S. Chu, D. Hart D, M. Pazzani. An Online Algorithm for Segmenting Time Series. Proc of the IEEE International Conference on Data Mining. 2001,289-296
    [KCP+00] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263-286,2000.
    [KCP+01] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. In Knowledge and Information Systems, volume 3, pages 263-286,2001.
    [Keo02] E. Keogh. Exact indexing of dynamic time warping. In Proc. 28th Int. Conf. on Very Large Data Bases, pages 406-417,2002.
    [KHK99] G Karypis, E.H Han, V. Kumar. CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER, 32:68-75, 1999.
    [KJF97] F. Korn, H. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 289-300,1997.
    [KK02] E. Keogh, S. Kasetty. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23-26, 2002. Edmonton, Alberta, Canada, pp 102-111.

    [KN98] E. Knorr, R. Ng. Algorithms for Mining Distance-based Outliers in Large Data Sets. In VLDB Conference Proceedings, 1998.
    [KP97] Krzysztof Kiks, Andrzej Pelc. Globally optimal diagnosis in system with random faults [J]. IEEE Trans, 1997, COMP-46(2):200-204
    [KP98] E. Keogh, M. Pazzani. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, pp 239-241, AAAI Press.
    [KPOO] Evangelos Kranakis, Andrzej Pelc. Better adaptive diagnosis of hypercubes [J]. IEEE Trans, 2000, COMP-49(10): 1013-1020
    [KPC01] S Kim, S. Park, and W. Chu. An indexed-based approach for similarity search supporting time warping in large sequence databases. In Proc. 17th Int. Conf. on Data Engineering, pages 607-614,2001.
    [KR90] L. Kaufman, P. J. Rousseeuw, Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons, 1990.
    [KU02] Kyoji Kawagoe, Tomohiro Ueda. A similarity search method of time series data with combination of Fourier and wavelet transforms[C]. Proceedings of 9th International Symposium on Temporal Representation and Reasoning. Manchester, UK. 2002:86-92.
    [LA99] Larsen, B., Aone, C. Fast and Effective Text Mining Using Linear-time Document Clustering. In KDD'99,16-22, 1999.
    [LB00] C. Li and Biswas. A Ayesian Approach to Temporal Data Clustering Using Hidden Markov Models. In International Conf. on Machine Learning, pages 543-550, 2000.
    [LJE94] MIC 'O, L., ONCINA, J., AND VIDAL, E. A new version of the nearest-neighbor approximating and eliminating search algorithm with linear preprocessing-time and memory requirements. Patt. Rec. Lett. 1994. 15, 1 (Jan.), 9-17.
    [Mac67] MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In 5th Berkeley Symp. Math. Statist. Prob., 281-297, 1967.
    [LW98] S. K. Lam and M. H. Wong. A Fast Projection Algorithm for Sequence Data Searching [J]. Data and Knowledge Engineering, 1998, 28(3):321-339.
    [MC67] Preparata, GMetze, R.T.chien. On the connection assignment problem of diagnosable systems [J]. IEEE Trans. Electronic Computers, 1967, 16(12):848-854
    [MFM92] Blough D M, Sullivan G F, Masson G M. Efficient diagnosis of multiprocessor systems under probabilistic models [J]. IEEE Trans, 1992,COMP-41(9):1126-1136
    [MTV95] H. Mannila, H. Toivonen, A. I Verkamo. Discovering Frequent episodes in Sequences. In Proc. Of KDD-95, pp: 210-215, Montreal, Canada, Aug, 1995.
    [Nav02] NAVARRO, G Searching in metric spaces by spatial approximation. VLDB Journal. 2002. 11,1, 28-46.
    [NH94] R. Ng, J. Han. Efficient and Effective Clustering Method for Spatial Data Mining. In Proc. Int. Conf. Very Large Data Bases (VLDB'94), pp. 144-155, Santiago, Chile, 1994.
    [NSC04] Samer Nassar, Jorg Sander, Corrine Cheng. Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering. SIGMOD'04,467-478,2004
    [OC96] T. Oates, P.R Cohen. Searching for Structure in Multiple Streams of Data. In: Proceedings of the 13~(th) International Conference on Machine Learning. Morgan Kaufmann Publishers, Inc., 1996
    [PCY95] J.S Park, M.S Chen, P. S Yu. An Effective Hash-based Algorithm for Mining Association Rules. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 175-186, San Jose, CA, May 1995.
    [PHM+01] J. Pei, J.W. Han, B. Mortazavi-Asl, H. Pinto, Q.M. Chen, U. Dayal, M.C. Hsu. PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth, Proceedings of the 17~(th) International Conference on Data Engineering, pages 215-224, 2001.
    [PM02] I. Popivanov and R. J. Miller. Similarity search over time series data using wavelets. In Proc. 17th Int. Conf. on Data Engineering, pages 212-221,2002.
    [PM02] CIACCIA, P. AND PATELLA, M. Searching in metric spaces with user-defined and approximatedistances. ACM Trans. Datab. Syst. 2002.27,4 (Dec), 398-437.
    [RM97] D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. In Proc. of the ACM SIGMOD Conf., Tucson, Arizona, USA, May 1997.
    [RM98] D. Rafiei and A. Mendelzon. Efficient retrieval of similar time sequences using DFT. In Proc. of the FODO Conf., Kobe, Japan, November 1998.
    [RR96] I. Ruts, P. Rousseeuw. Computing Depth Contours of Bivariate Point Clouds. Computational Statistics and Data Analysis, 1996.
    [RRSOO] S. Ramaswamy, R. Rastogi, K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. In ACM SIGMOD Conference Proceedings, 2000.
    [SA95] R. Srikant and R. Agrawal. Mining Generalized Association Rules. VLDB'95, pp. 407-419.
    [SCZ00] G Sheikholeslami. S., Chatterjee., A, Zhang. Wavecluster: a wavelet based clustering approach for spatial data in very large databases. VLDB Journal, 2000: 289-304
    [SCZ98] G Sheikholeslami, S. Chaterjee, A. Zhang. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Database. In Proc. of the 24th Conference on VLDB, pp428-439, New York, 1998
    [SDS96] Eric J. Stollnitz, Tony D. Derose, David H. Salesin. Wavelets for Computer Graphics. Morgan Kaufmann, 1996
    [Sha95] Hagit Shatkay. The Fourier transform: A primer[R]. Department of Computer Science Brown University Providence, Technical Report CS-95-37, 1995.
    [Sib73] Sibson, R. SLINK: An Optimally Efficient Algorithm for the Single-link Cluster Method. The Computer Journal, 16(1): 30-34, 1973.
    [Smy97] P. Smyth. Clustering Sequences with Hidden Markov Models. In M. Mozer, M. Jordan, and T.Petsche, editors, Advances in Neural Information Processing Systems, volume 9, pages 648-654. MIT Press, 1997.
    [SON95] A. Savasere, E. Omiecinski, S. Navathe. An efficient algorithm for mining association rules in large databases. Proceedings of the 21st International Conference on Very large Database, 1995.
    [SQL+03] Sander, J., Qin, X., Lu, Z., Niu, N, Kovarsky, A. Automated Extraction of Clusters from Hierarchical Clustering Representations. PAKDD'03.
    [SS99] Z. Struzik, A. Sibes. Measuring Time Series Similarity Through Large Singular Features Revealed with Wavelet Transformation.In Proc. Of the 10~(th) Intl. Workshop on Database and Expert Systems Application, pp: 162-166, 1999.
    [SS99] Z. R. Struzik and A. P. J. M. Siebes. The Haar wavelet transform in the time series similarity paradigm. In Principles of Data Mining and Knowledge Discovery, pages 12-22. Berlin, Germany, 1999.
    [SYU+00] Y. Sakurai, M. Yoshikawa, S. Uemura, H. Kojima. The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation. In Proc. Of the 26~(th) International Conference on Very large Data Bases (VLDB), pp.516-526, Cairo, Egypt, 2000.
    [TM99] BOZKAYA, T. AND OZSOYOGLU, M. Indexing large metric spaces for similarity search queries. ACM Trans. Datab. Syst. 1999. 24, 3 (Sept.), 361-404.
    [Toi96] H. Toivonen. Sampling large databases for association rules. Proceedings of the 22nd International Conference on Very Large Database, Bombay, India, September 1996.
    [Uhl91] UHLMANN, J. K. Satisfying general proximity/similarity queries with metric trees. Inf.Process. Lett. 1991. 40,4 (Nov.), 175-179.
    [Vid94] VIDAL, E. New formulation and improvements of the nearest-neighbor approximating and eliminating search algorithm (AESA). Patt. Rec. Lett. 1994. 15, 1 (Jan.), 1-7.
    [VMV+04] Michail Vlachos. Chris Meek. Zografoula Vagena. Dimitrios Gunoplos. Identifying similarities, periodicities and bursts for online search queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 131-142, 2004.
    [VO98] GAEDE, V. AND GUNTHER, O. Multidimensional access methods. ACM Comput. Surv. 20,2 (June), 1998. 170-231.
    [VW99] J.S Vitter, M. Wang. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proc of SIGMOD 1999
    [WAA00] Yi-Leh Wu, Divyakant Agrawal, Amr El Abbadi. A comparison of dft and dwt based similarity search in time-series databases[C]. Proceedings of the ninth international conference on Information and knowledge management. New York: ACM Press. 2000:488 - 495
    [WIY02] Widyantoro, D. H., Ioerger, T. R., Yen, J. An Incremental Approach to Building a Cluster Hierarchy. ICDM'02, 705-708, 2002.
    [WSB98] R. Weber, H.J Schek, S. Blott. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. In Proc. Of 24th VLDB Conference, pp194-205, New York, USA, 1998.
    [WW04] Z.J Wang, P. Willett. Joint Segmentation and Classification of Time Series Using Class-Specific Features. IEEE Transactions on Systems, Man and Cybernetics. 2004, 1056-1067
    [WYM97] W. Wang, J. Yang, R. R. Muntz. STING: A Statistical Information Grid Approach to Spatial Data Ming. In Proc. Of the 23rd Conference on VLDB, pp186-195, Athens, Greece, 1997.
    [YCC97] Xiaofan Yang, Tinghuai Chen, Bing Cai. A system level Diagnostics Algorithm. Chinese Journal of Computers, 1997,20(4):342-349
    [YF00] B.-K. Yi and C. Faloutsos. Fast time sequence indexing for arbitrary Lp norms. In Proc. of the VLDB Conf., pages 385-394, Cairo, Egypt, 2000. Morgan Kaufmann.
    [YHC+04] H.Q Yang, K.Z Huang, L.W Chan, I. King, M.R Lyu. Outliers Treatment in Support Vector Regression for Financial Time Series Prediction. In: Proceedings of ICONIP 2004, pp: 1260-1265
    [Yia93] YIANILOS, P. N. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (Austin, Tex.). ACM, New York, 1993.311-321.
    [ZC03] Hai-Qin ZHANG, Qing-Sheng CAI. Time Series Similar Pattern Matching Based on Wavelet Transform. Chinese Journal of Computers.Vol.26, No.3, Mar. 2003,P372~377
    [ZHS04] X.T Zhuang, X.Y Huang, Y.L Sun. Research on the Fractal Structure in the Chinese Stock Market. Physica a-Statistical Mechanics and Its Applications 333: 293-305,2004.
    [ZHS04] Hui Zhao, Jianrong Hou, Baile Shi. Research on Similarity of Stochastic Non-Stationary Time Series Based on Wavelet-Fractal. Journal of Software, 2004,15 (5):633-640
    [ZOC02] Cheng ZHENG, Wei-ming OUYANG, Qing-sheng CAI, An Efficient dimensionality reduction technique for times series data sets. Mini-Macro System, Vol.23,No.11,Nov. 2002, P1380-1383
    [ZRL96] T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Porc. Of ACM SIGMOD, Montreal Canada, pp. 103-114, June 1996.
    [ZS03] Y. Zhu and D. Shasha. Warping indexes with envelope transforms for query by humming. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 181-192,2003.
    [ZSH03] H.Q Zeng, Z. Shen, Y.F Hu. Mining Squence Pattern form Time Series Based on Inter-Relevant Successive Trees Model. In Proceedings of 9~(th) International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, LNCS/LNAI, Spring-Verlag, 2003.
    [ZW04] Xiang-Min ZHOU, Guo-Ren WANG Key Dimension Based High-Dimensional Data Partition Strategy. Journal of Software 2004.15(9)1361-1374
    [ZXM01] Zhang Dafang, Xie Gaogang, Min yinghua. Node grouping in system-level fault diagnosis[J]. Journal of Computer Science & Technology, 200124(5):474-479.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700