Efficient 2D viewpoint combination for human action recognition

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Efficient 2D viewpoint combination for human action recognition

详细信息查看全文

作者：Behrouz Saghafi ; Deepu Rajan ; Wanqing Li
关键词：Histogram ; Combination ; Multi ; view ; Action recognition
刊名：Pattern Analysis & Applications
出版年：2016
出版时间：May 2016
年：2016
卷：19
期：2
页码：563-577
全文大小：1,879 KB
参考文献：1.Ashraf N, Sun C, Foroosh H (2014) View invariant action recognition using projective depth. Comput Vis Image Underst 123:41–52CrossRef
2.Atrey PK, Hossain MA, Saddik AE, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimedia Syst 16:345–379CrossRef
3.Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing
4.Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522CrossRef
5.Bobick A, Davis J (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267CrossRef
6.Wu C, Khalili AH, Aghajan H (2010) Multiview activity recognition in smart homes with spatio-temporal features. In: ACM/IEEE international conference on distributed smart cameras (2010)
7.Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)
8.Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064CrossRef
9.Cheng SY, Trivedi MM (2007) Articulated human body pose inference from voxel data using a kinematically constrained gaussian mixture model. In: CVPR Workshops
10.Cortes C, Gretton A, Lanckriet G, Mohri M, Rostamizadeh A (2008) Automatic selection of optimal kernels. In: Proceedings of the NIPS workshop on Kernel learning
11.Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: International workshop on performance evaluation of tracking and surveillance, ICCV
12.Farhadi A, Tabrizi M (2008) Learning to recognize activities from the wrong view point. In: European conference on computer vision (ECCV)
13.Fu H, Qiu G, He H (2011) Feature combination beyond basic arithmetics. In: British machine vision conference (BMVC)
14.Gehler P, Nowozin S (2009) On feature combination for multiclass object classification. In: International conference on computer vision (ICCV)
15.Gkalelis N, Nikolaidis N, Pitas I (2009) View indepedent human movement recognition from multi-view video exploiting a circular invariant posture representation. In: IEEE international conference on multimedia and expo
16.Holte MB, Moeslund T, Nikolaidis N, Pitas I (2011) 3D human action recognition for multi-view camera systems. In: International conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT)
17.Holte MB, Tran C, Trivedi MM, Moeslund TB (2012) Human pose estimation and activity recognition from multi-view videos: comparative explorations of recent developments. IEEE J Sel Top Sign Process 6(5):538–552CrossRef
18.Huang P, Hilton A, Starck J (2010) Shape similarity for 3d video sequences of people. Int J Comput Vis 89(2–3):362–381CrossRef
19.Jhuo IH, Lee DT (2010) Boosted multiple kernel learning for scene category recognition. In: International conference on pattern recognition (ICPR)
20.Junejo IN, Dexter E, Laptev I, Pérez P (2008) Cross-view action recognition from temporal self-similarities. In: European conference on computer vision (ECCV)
21.Kloft M, Brefeld U, Sonnenburg S, Zien A (2011) Lp-norm multiple kernel learning. J Mach Learn Res 12:953–997MathSciNet MATH
22.Laptev I (2005) On space-time interest points. Int. J. Comput Vis 64(2):107–123MathSciNet CrossRef
23.Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE international conference on computer vision and pattern recognition (CVPR)
24.Liu J, Ali S, Shah M (2008) Recognizing human actions using multiple features. In: IEEE international conference on computer vision and pattern recognition (CVPR)
25.Liu J, Shah M (2008) Learning human action via information maximization. In: IEEE international conference on computer vision and pattern recognition (CVPR)
26.Lv F, Nevatia R (2007) Single view human action recognition using key pose matching and viterbi path searching. In: IEEE international conference on computer vision and pattern recognition (CVPR)
27.Maji S, Berg A, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: IEEE international conference on computer vision and pattern recognition (CVPR)
28.Matikainen P, Pillai P, Mummert L, Sukthankar R, Hebert M (2011) Prop-free pointing detection in dynamic cluttered environments. In: IEEE international conference on automatic face and gesture recognition and workshops
29.Naiel M, Abdelwahab M, El-Saban M (2011) Multi-view human action recognition system employing 2dpca. In: Workshop on applications of computer vision (WACV)
30.Pehlivan S, Duygulu P (2010) A new pose-based representation for recognizing actions from multiple cameras. Comput Vis Image Underst 115:140–151CrossRef
31.Pehlivan S, Forsyth DA (2014) Recognizing activities in multiple views with fusion of frame judgments. Image Vis Comput 32(4):237–249CrossRef
32.Peng B, Qian G (2011) Online gesture spotting from visual hull data. IEEE Trans Pattern Anal Mach Intell 33(6):1175–1188CrossRef
33.Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: International conference on machine learning (ICML)
34.Ramagiri S, Kavi R, Kulathumani V (2011) Real-time multi-view action recognition using a wireless camera network. In: ACM/IEEE international conference on distributed smart cameras
35.Reddy K, Liu J, Shah M (2009) Incremental action recognition using feature-tree. In: International conference on computer vision (ICCV)
36.Song Y, Demirdjian D, Davis R (2011) Multi-signal gesture recognition using temporal smoothing hidden conditional random fields. In: IEEE international conference on automatic face and gesture recognition and workshops
37.Sonnenburg S, Rätsch G, Schäfer C, Schölkopf B (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565MathSciNet MATH
38.Souvenir R, Babbs J (2008) Learning the viewpoint manifold for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)
39.Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038CrossRef
40.Swain MJ, Ballard DH (1991) Color indexing. Int J Comput Vis 7(1):11–32CrossRef
41.Turaga P, Veeraraghavan A, Chellappa R (2008) Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In: IEEE international conference on computer vision and pattern recognition (CVPR)
42.Varma M, Ray D (2007) Learning the discriminative power-invariance trade-off. In: International conference on computer vision (ICCV)
43.Veeraraghavan A, Srivastava A, Roy-Chowdhury A, Chellappa R (2009) Rate-invariant recognition of humans and their activities. IEEE Trans Image Process 18(6):1326–1339MathSciNet CrossRef
44.Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE international conference on computer vision and pattern recognition (CVPR)
45.Vitaladevuni S, Kellokumpu V, Davis L (2008) Action recognition using ballistic dynamics. In: IEEE international conference on computer vision and pattern recognition (CVPR)
46.Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3D exemplars. In: International conference on computer vision (ICCV)
47.Weinland D, Özuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes. In: European conference on computer vision (ECCV)
48.Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vis Image Underst 104:249–257CrossRef
49.Yan P, Khan S, Shah M (2008) Learning 4d action feature models for arbitrary view action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR)
作者单位：Behrouz Saghafi (1)
Deepu Rajan (2)
Wanqing Li (3)

1. Centre for Multimedia and Network Technology, School of Computer Engineering, Nanyang Technological University, Singapore, 639798, Singapore
2. School of Computer Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
3. Information and Communication Technology (ICT) Research Institute, University of Wollongong, Wollongong, NSW, 2522, Australia
刊物类别：Computer Science
刊物主题：Pattern Recognition
出版者：Springer London
ISSN：1433-755X

文摘

The ability to recognize human actions using a single viewpoint is affected by phenomena such as self-occlusions or occlusions by other objects. Incorporating multiple cameras can help overcome these issues. However, the question remains how to efficiently use information from all viewpoints to increase performance. Researchers have reconstructed a 3D model from multiple views to reduce dependency on viewpoint, but this 3D approach is often computationally expensive. Moreover, the quality of each view influences the overall model and the reconstruction is limited to volumes where the views overlap. In this paper, we propose a novel method to efficiently combine 2D data from different viewpoints. Spatio-temporal features are extracted from each viewpoint and then used in a bag-of-words framework to form histograms. Two different sizes of codebook are exploited. The similarity between the obtained histograms is represented via the Histogram Intersection kernel as well as the RBF kernel with \(\chi ^2\) distance. Lastly, we combine all the basic kernels generated by selection of different viewpoints, feature types, codebook sizes and kernel types. The final kernel is a linear combination of basic kernels that are properly weighted based on an optimization process. For higher accuracy, the sets of kernel weights are computed separately for each binary SVM classifier. Our method not only combines the information from multiple viewpoints efficiently, but also improves the performance by mapping features into various kernel spaces. The efficiency of the proposed method is demonstrated by testing on two commonly used multi-view human action datasets. Moreover several experiments indicate the efficacy of each part of the method on the overall performance.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700