用户名: 密码: 验证码:
Parallel outlier detection on uncertain data for GPUs
详细信息    查看全文
  • 作者:Takazumi Matsumoto ; Edward Hung ; Man Lung Yiu
  • 关键词:GPU ; Outlier detection ; Parallel processing ; Uncertain data
  • 刊名:Distributed and Parallel Databases
  • 出版年:2015
  • 出版时间:September 2015
  • 年:2015
  • 卷:33
  • 期:3
  • 页码:417-447
  • 全文大小:2,959 KB
  • 参考文献:1.Acklam, P.J.: An algorithm for computing the inverse normal cumulative distribution function. Tech. Rep. (2003)
    2.Advanced Micro Devices Inc: AMD accelerated parallel processing opencl programming guide
    3.Aggarwal, C.C. (ed.): Managing and Mining Uncertain Data. Springer, New York, NY (2009)
    4.Aggarwal, C.C., Yu, P.S.: Outlier detection with uncertain data. In: Proceedings of the SIAM International Conference on Data Mining 2008 (2008)
    5.Aggarwal, C.C., Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE TKDE 21(5), 609鈥?23 (2009)
    6.Alshawabkeh, M., Jang, B., Kaeli, D.: Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units (2010)
    7.Azmandian, F., Yilmazer, A., Dy, J.G., Aslam, J.A., Kaeli, D.R.: GPU-accelerated feature selection for outlier detection using the local kernel density ratio. In: Proceedings of the 12th IEEE ICDM (2012)
    8.Bastke, S., Deml, M., Schmidt, S.: Combining statistical network data, probabilistic neural networks and the computational power of GPUs for anomaly detection in computer networks. In: 1st Workshop on Intelligent Security (Security and Artificial Intelligence) (2009)
    9.Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci. 17(3), 235鈥?55 (2002)MathSciNet View Article MATH
    10.Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of SIGMOD 2000 (2000)
    11.Chau, M., Cheng, R., Kao, B., Ng, J.: Uncertain data mining: an example in clustering location data. In: Proceedings of the 10th PAKDD (2006)
    12.Denoeux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE TKDE 25(1), 119鈥?30 (2013)
    13.Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)
    14.Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection of outliers. IEEE TKDE 18(2), 145鈥?60 (2006)
    15.Fang, W., Lau, K.K., Lu, M., Xiao, X., Lam, C.K., Yang, P.Y., He, B., Luo, Q., Sander, P.V., Yang, K.: Parallel data mining on graphics processors. Tech. Rep., Hong Kong University of Science and Technology (2008)
    16.He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the ACM/IEEE Conference on Supercomputing (2007)
    17.Hung, E., Cheung, D.W.: Parallel mining of outliers in large database. Distrib. Parallel Databases 12(1), 5鈥?6 (2002)View Article
    18.Kao, B., Lee, S.D., Cheung, D.W., Ho, W.S., Chan, K.F.: Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th IEEE ICDM (2008)
    19.Khronos Group: OpenCL. http://鈥媤ww.鈥媖hronos.鈥媜rg/鈥媜pencl/鈥?/span> (2011). Accessed 9 October 2012
    20.Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of VLDB 1998 (1998)
    21.Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distance-based outliers. In: Proceedings of VLDB 1999, pp. 211鈥?22 (1999)
    22.Kriegel, H.P., Pfeifle, M.: Density-based clustering of uncertain data. In: Proceedings of the 11th ACM SIGKDD (2005)
    23.Kriegel, H.P., Pfeifle, M.: Heirarchical density-based clustering of uncertain data. In: Proceedings of the 5th IEEE ICDM (2005)
    24.Krulis, M., Skopal, T., Lokoc, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30, 179鈥?07 (2012)View Article MATH
    25.Lan, Z., Zheng, Z., Li, Y.: Toward automated anomaly identification in large-scale systems. IEEE TPDS 21(2), 174鈥?87 (2010)
    26.Lozano, E., Acuna, E.: Parallel algorithms for distance-based and density-based outliers. In: Proceedings of the 5th IEEE ICDM (2005)
    27.Lu, M., Tan, Y., Bai, G., Luo, Q.: High-performance short sequence alignment with GPU acceleration. Distrib. Parallel Databases 30, 385鈥?99 (2012)View Article
    28.Marsaglia, G.: Xorshift RNGs. J. Stat. Softw. 8(14), 1鈥? (2003)
    29.Matsumoto, T., Hung, E.: Accelerating outlier detection with uncertain data using graphics processors. In: Advances in Knowledge Discovery and Data Mining, vol. LNCS 7302, pp. 169鈥?80 (2012)
    30.Micikevicius, P.: Analysis-driven optimization. In: GPU Technology Conference 2010 (2010)
    31.Murakami, T., Kasahara, R., Saito, T.: An implementation and its evaluation of password cracking tool parallelized on GPGPU. In: Proceedings of the 2010 International Symposium on Communications and Information Technologies (2010)
    32.Ngai, W.K., Kao, B., Chui, C.K., Cheng, R., Chau, M., Yip, K.Y.: Efficient clustering of uncertain data. In: Proceedings of the 6th IEEE ICDM (2006)
    33.NVIDIA Corporation: CUDA. http://鈥媤ww.鈥媙vidia.鈥媍om/鈥媜bject/鈥媍uda_鈥媓ome_鈥媙ew.鈥媓tml (2011). Accessed 9 October 2012
    34.Reif, M., Goldstein, M., Stahl, A.: Anomaly detection by combining decision trees and parametric densities. In: 19th International Conference on Pattern Recognition 2008 (2008)
    35.Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD (2000)
    36.Sequeria, K., Zaki, M.: ADMIT: Anomaly-based data mining for intrusions. In: Proceedings of the 8th ACM SIGKDD (2002)
    37.Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowl. Inf. Syst. 11(1), 45鈥?4 (2006)View Article
    38.Tarabalka, Y., Haavardsholm, T.V., Kaasen, I., Skauli, T.: Real-time anomaly detection in hyperspectral images using multivariate normal mixture models and GPU processing. J. Real-Time Image Process. 4(3), 287鈥?00 (2009)View Article
    39.Wang, L., Cheung, D.W.L., Cheng, R., Lee, S.D., Yang, X.S.: Efficient mining of frequent item sets on large uncertain databases. IEEE TKDE 24(12), 2170鈥?183 (2012)
    40.Zhanchun, G., Yuying, L.: Improving the collaborative filtering recommender system by using GPU. In: Proceedings of 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (2012)
    41.Zhang, Y., Lin, X., Tao, Y., Zhang, W., Wang, H.: Efficient computation of range aggregates against uncertain location-based queries. IEEE TKDE 24(7), 1244鈥?258 (2012)
  • 作者单位:Takazumi Matsumoto (1)
    Edward Hung (1)
    Man Lung Yiu (1)

    1. The Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
  • 刊物类别:Computer Science
  • 刊物主题:Database Management
    Data Structures
    Information Systems Applications and The Internet
    Operating Systems
    Memory Structures
  • 出版者:Springer Netherlands
  • ISSN:1573-7578
文摘
Outlier detection, also known as anomaly detection, is a common data mining task in identifying data points that are outside expected patterns in a given dataset. It has useful applications such as network intrusion, system faults, and fraudulent activity. In addition, real world data are uncertain in nature and they may be represented as uncertain data. In this paper, we propose an improved parallel algorithm for outlier detection on uncertain data using density sampling and develop an implementation running on both GPUs and multi-core CPUs, using the OpenCL framework. Our main focus is on GPUs, as they are a cost effective massively parallel floating point processor that is suitable for many data mining applications. Our implementation exploits some key features in GPUs, and is significantly different from a traditional CPU implementation. We first present an improved uncertain outlier detection algorithm. Then, we demonstrate two parallel micro-clustering implementations. The performance and detection quality comparisons demonstrate the benefits of the improved algorithm and parallel implementation on GPUs.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700