Video Primal Sketch: A Unified Middle-Level Representation for Video

设为首页

收藏本站

网站地图 | English | 公务邮箱

读者指南

学术客户端

NSTL服务站

科技查新

Video Primal Sketch: A Unified Middle-Level Representation for Video

详细信息查看全文

作者：Zhi Han ; Zongben Xu ; Song-Chun Zhu
关键词：Middle ; level vision ; Video representation ; Textured motion ; Dynamic texture synthesis ; Primal sketch
刊名：Journal of Mathematical Imaging and Vision
出版年：2015
出版时间：October 2015
年：2015
卷：53
期：2
页码：151-170
全文大小：6,468 KB
参考文献：1.Adelson, E., Bergen, J.: Spatiotemporal energy models for the perception of motion. JOSA A 2(2), 284鈥?99 (1985)
2.Bergen, J.R., Adelson, E.H.: In: Regan, D. (ed.) Theories of Visual Texture Perception. Spatial Vision. CRC Press, Boca Raton, FL (1991)
3.Besag, J.: Spatial interactions and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192鈥?36 (1974)
4.Black, M.J., Fleet, D.J.: Probabilistic detection and tracking of motion boundaries. IJCV 38(3), 231鈥?45 (2000)
5.Bouthemy, P., Hardouin, C., Piriou, G., Yao, J.: Mixed-state auto-models and motion texture modeling. J. Math. Imaging Vis. 25(3) (2006)
6.Campbell, N.W., Dalton, C., Gibson, D., Thomas, B. : Practical generation of video textures using the auto-regressive process. In: Proceedings of British Machine Vision Conference, pp 434鈥?43 (2002)
7.Chan, A.B., Vasconcelos, N.: Modeling, clustering, and segmenting video with mixtures of dynamic textures. PAMI 30(5), 909鈥?26 (2008)
8.Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R. : Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. CVPR (2009)
9.Chubb, C., Landy, M.S.: Orthogonal distribution analysis: a new approach to the study of texture perception. In: Landy, M.S., et al. (eds.) Proceedings of the Comp Models of Visual. MIT Press, Cambridge, MA (1991)
10.Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5), 564鈥?77 (2003)
11.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. CVPR (2005)
12.Dalal, N., Triggs, B., Schmid, C. : Human detection using oriented histograms of flow and appearance. ECCV (2006)
13.Derpanis, K.G., Wildes, R.P. : Dynamic texture recognition based on distributions of spacetime oriented structure. CVPR (2010)
14.Doretto, G., Chiuso, A., Wu, Y.N., Soatto, S.: Dynamic textures. IJCV 51(2), 91鈥?09 (2003)
15.Elder, J., Zucker, S.: Local scale control for edge detection and blur estimation. PAMI 20(7), 699鈥?16 (1998)
16.Fan, Z., Yang, M., Wu, Y., Hua, G., Yu, T. : Effient optimal kernel placement for reliable visual tracking. CVPR (2006)
17.Gong, H.F., Zhu, S.C.: Intrackability: characterizing video statistics and pursuing video representations. IJCV 97(33), 255鈥?75 (2012)
18.Guo, C., Zhu, S.C., Wu, Y.N.: Primal sketch: integrating texture and structure. CVIU 106(1), 5鈥?9 (2007)
19.Han, Z., Xu, Z., Zhu, S.C.: Video primal sketch: a generic middle-level representation of video. ICCV (2011)
20.Heeger, D.: Model for the extraction of image flow. JOSA A 4(8), 1455鈥?471 (1987)
21.Heeger, D.J., Bergen, J.R.: Pyramid-based texture analysis/synthesis. SIGGRAPH (1995)
22.Kim, T., Shakhnarovich, G., Urtasun, R.: Sparse coding for learning interpretable spatio-temporal primitives. NIPS (2010)
23.Lindeberg, T., Fagerstrm, D.: Scale-space with casual time direction. ECCV (1996)
24.Maccormick, J., Blake, A.: A probabilistic exclusion principle for tracking multiple objects. IJCV 39(1), 57鈥?1 (2000)
25.Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE TSP 41(12), 3397鈥?415 (1993)
26.Marr, D.: Vision. W H Freeman and Company, San Francisco, CA (1982)
27.Olshausen, B.A.: Learning sparse, overcomplete representations of time-varying natural images. ICIP (2003)
28.Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (1996)
29.Portilla, J., Simoncelli, E.: A parametric texture model based on joint statistics of complex wavelet coefficients. IJCV 40(1), 49鈥?1 (2000)View Article
30.Ravichandran, A., Chaudhry, R., Vidal, R.: View-invariant dynamic texture recognition using a bag of dynamical systems. CVPR (2009)
31.Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. ICPR (2004)
32.Serby, D., Koller-Meier, S., Gool, L.V.: Probabilistic object tracking using multiple features. ICPR (2004)
33.Shi, K., Zhu, S.C.: Mapping natural image patches by explicit and implicit manifolds. CVPR (2007)
34.Silverman, M.S., Grosof, D.H., Valois, R.L.D., Elfar, S.D.: Spatial-frequency organization in primate striate cortex. Proc. Natl. Acad. Sci. 86, 711鈥?15 (1989)
35.Szummer, M., Picard, R.W.: Temporal texture modeling. ICIP (1996)
36.Wang, Y.Z., Zhu, S.C.: Analysis and synthesis of textured motion: particles and waves. PAMI 26(10), 1348鈥?363 (2004)
37.Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error measurement to structural similarity. IEEE TIP 13(4) (2004)
38.Wildes, R., Bergen, J.: Qualitative spatiotemporal analysis using an oriented energy representation. ECCV (2000)
39.Wu, Y.N., Zhu, S.C., Liu, X.W.: Equivalence of julesz ensemble and frame models. IJCV 38(3), 247鈥?65 (2000)
40.Yao, B., Zhu, S.C.: Learning deformable action templates from cluttered videos. ICCV (2009)
41.Yuan, F., Prinet, V., Yuan, J.: Middle-level representation for human activities recognition: the role of spatio-temporal relationships. ECCVW (2010)
42.Zhu, S.C., Wu, Y.N., Mumford, D.B.: Filters, random field and maximum entropy (FRAME): towards a unified theory for texture modeling. IJCV 27(2), 107鈥?26 (1998)
作者单位：Zhi Han (1) (2) (3)
Zongben Xu (1)
Song-Chun Zhu (2)

1. Institute for Information and System Sciences, Xi鈥檃n Jiaotong University, Xi鈥檃n, China
2. Department of Stat and CS, University of California, Los Angeles, USA
3. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
刊物类别：Computer Science
刊物主题：Computer Imaging, Vision, Pattern Recognition and Graphics
Image Processing and Computer Vision
Artificial Intelligence and Robotics
Automation and Robotics
出版者：Springer Netherlands
ISSN：1573-7683

文摘

This paper presents a middle-level video representation named video primal sketch (VPS), which integrates two regimes of models: (i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., (ii) FRAME /MRF model reproducing feature statistics extracted from input video to implicitly represent textured motion, such as water and fire. The feature statistics include histograms of spatio-temporal filters and velocity distributions. This paper makes three contributions to the literature: (i) Learning a dictionary of video primitives using parametric generative models; (ii) Proposing the spatio-temporal FRAME and motion-appearance FRAME models for modeling and synthesizing textured motion; and (iii) Developing a parsimonious hybrid model for generic video representation. Given an input video, VPS selects the proper models automatically for different motion patterns and is compatible with high-level action representations. In the experiments, we synthesize a number of textured motion; reconstruct real videos using the VPS; report a series of human perception experiments to verify the quality of reconstructed videos; demonstrate how the VPS changes over the scale transition in videos; and present the close connection between VPS and high-level action models.

常见问题　|　交通位置　|　联系我们　|　OA远程办公

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700