用户名: 密码: 验证码:
Video Primal Sketch: A Unified Middle-Level Representation for Video
详细信息    查看全文
  • 作者:Zhi Han ; Zongben Xu ; Song-Chun Zhu
  • 关键词:Middle ; level vision ; Video representation ; Textured motion ; Dynamic texture synthesis ; Primal sketch
  • 刊名:Journal of Mathematical Imaging and Vision
  • 出版年:2015
  • 出版时间:October 2015
  • 年:2015
  • 卷:53
  • 期:2
  • 页码:151-170
  • 全文大小:6,468 KB
  • 参考文献:1.Adelson, E., Bergen, J.: Spatiotemporal energy models for the perception of motion. JOSA A 2(2), 284鈥?99 (1985)
    2.Bergen, J.R., Adelson, E.H.: In: Regan, D. (ed.) Theories of Visual Texture Perception. Spatial Vision. CRC Press, Boca Raton, FL (1991)
    3.Besag, J.: Spatial interactions and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192鈥?36 (1974)
    4.Black, M.J., Fleet, D.J.: Probabilistic detection and tracking of motion boundaries. IJCV 38(3), 231鈥?45 (2000)
    5.Bouthemy, P., Hardouin, C., Piriou, G., Yao, J.: Mixed-state auto-models and motion texture modeling. J. Math. Imaging Vis. 25(3) (2006)
    6.Campbell, N.W., Dalton, C., Gibson, D., Thomas, B. : Practical generation of video textures using the auto-regressive process. In: Proceedings of British Machine Vision Conference, pp 434鈥?43 (2002)
    7.Chan, A.B., Vasconcelos, N.: Modeling, clustering, and segmenting video with mixtures of dynamic textures. PAMI 30(5), 909鈥?26 (2008)
    8.Chaudhry, R., Ravichandran, A., Hager, G., Vidal, R. : Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions. CVPR (2009)
    9.Chubb, C., Landy, M.S.: Orthogonal distribution analysis: a new approach to the study of texture perception. In: Landy, M.S., et al. (eds.) Proceedings of the Comp Models of Visual. MIT Press, Cambridge, MA (1991)
    10.Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5), 564鈥?77 (2003)
    11.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. CVPR (2005)
    12.Dalal, N., Triggs, B., Schmid, C. : Human detection using oriented histograms of flow and appearance. ECCV (2006)
    13.Derpanis, K.G., Wildes, R.P. : Dynamic texture recognition based on distributions of spacetime oriented structure. CVPR (2010)
    14.Doretto, G., Chiuso, A., Wu, Y.N., Soatto, S.: Dynamic textures. IJCV 51(2), 91鈥?09 (2003)
    15.Elder, J., Zucker, S.: Local scale control for edge detection and blur estimation. PAMI 20(7), 699鈥?16 (1998)
    16.Fan, Z., Yang, M., Wu, Y., Hua, G., Yu, T. : Effient optimal kernel placement for reliable visual tracking. CVPR (2006)
    17.Gong, H.F., Zhu, S.C.: Intrackability: characterizing video statistics and pursuing video representations. IJCV 97(33), 255鈥?75 (2012)
    18.Guo, C., Zhu, S.C., Wu, Y.N.: Primal sketch: integrating texture and structure. CVIU 106(1), 5鈥?9 (2007)
    19.Han, Z., Xu, Z., Zhu, S.C.: Video primal sketch: a generic middle-level representation of video. ICCV (2011)
    20.Heeger, D.: Model for the extraction of image flow. JOSA A 4(8), 1455鈥?471 (1987)
    21.Heeger, D.J., Bergen, J.R.: Pyramid-based texture analysis/synthesis. SIGGRAPH (1995)
    22.Kim, T., Shakhnarovich, G., Urtasun, R.: Sparse coding for learning interpretable spatio-temporal primitives. NIPS (2010)
    23.Lindeberg, T., Fagerstrm, D.: Scale-space with casual time direction. ECCV (1996)
    24.Maccormick, J., Blake, A.: A probabilistic exclusion principle for tracking multiple objects. IJCV 39(1), 57鈥?1 (2000)
    25.Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE TSP 41(12), 3397鈥?415 (1993)
    26.Marr, D.: Vision. W H Freeman and Company, San Francisco, CA (1982)
    27.Olshausen, B.A.: Learning sparse, overcomplete representations of time-varying natural images. ICIP (2003)
    28.Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381 (1996)
    29.Portilla, J., Simoncelli, E.: A parametric texture model based on joint statistics of complex wavelet coefficients. IJCV 40(1), 49鈥?1 (2000)View Article
    30.Ravichandran, A., Chaudhry, R., Vidal, R.: View-invariant dynamic texture recognition using a bag of dynamical systems. CVPR (2009)
    31.Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. ICPR (2004)
    32.Serby, D., Koller-Meier, S., Gool, L.V.: Probabilistic object tracking using multiple features. ICPR (2004)
    33.Shi, K., Zhu, S.C.: Mapping natural image patches by explicit and implicit manifolds. CVPR (2007)
    34.Silverman, M.S., Grosof, D.H., Valois, R.L.D., Elfar, S.D.: Spatial-frequency organization in primate striate cortex. Proc. Natl. Acad. Sci. 86, 711鈥?15 (1989)
    35.Szummer, M., Picard, R.W.: Temporal texture modeling. ICIP (1996)
    36.Wang, Y.Z., Zhu, S.C.: Analysis and synthesis of textured motion: particles and waves. PAMI 26(10), 1348鈥?363 (2004)
    37.Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error measurement to structural similarity. IEEE TIP 13(4) (2004)
    38.Wildes, R., Bergen, J.: Qualitative spatiotemporal analysis using an oriented energy representation. ECCV (2000)
    39.Wu, Y.N., Zhu, S.C., Liu, X.W.: Equivalence of julesz ensemble and frame models. IJCV 38(3), 247鈥?65 (2000)
    40.Yao, B., Zhu, S.C.: Learning deformable action templates from cluttered videos. ICCV (2009)
    41.Yuan, F., Prinet, V., Yuan, J.: Middle-level representation for human activities recognition: the role of spatio-temporal relationships. ECCVW (2010)
    42.Zhu, S.C., Wu, Y.N., Mumford, D.B.: Filters, random field and maximum entropy (FRAME): towards a unified theory for texture modeling. IJCV 27(2), 107鈥?26 (1998)
  • 作者单位:Zhi Han (1) (2) (3)
    Zongben Xu (1)
    Song-Chun Zhu (2)

    1. Institute for Information and System Sciences, Xi鈥檃n Jiaotong University, Xi鈥檃n, China
    2. Department of Stat and CS, University of California, Los Angeles, USA
    3. State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China
  • 刊物类别:Computer Science
  • 刊物主题:Computer Imaging, Vision, Pattern Recognition and Graphics
    Image Processing and Computer Vision
    Artificial Intelligence and Robotics
    Automation and Robotics
  • 出版者:Springer Netherlands
  • ISSN:1573-7683
文摘
This paper presents a middle-level video representation named video primal sketch (VPS), which integrates two regimes of models: (i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., (ii) FRAME /MRF model reproducing feature statistics extracted from input video to implicitly represent textured motion, such as water and fire. The feature statistics include histograms of spatio-temporal filters and velocity distributions. This paper makes three contributions to the literature: (i) Learning a dictionary of video primitives using parametric generative models; (ii) Proposing the spatio-temporal FRAME and motion-appearance FRAME models for modeling and synthesizing textured motion; and (iii) Developing a parsimonious hybrid model for generic video representation. Given an input video, VPS selects the proper models automatically for different motion patterns and is compatible with high-level action representations. In the experiments, we synthesize a number of textured motion; reconstruct real videos using the VPS; report a series of human perception experiments to verify the quality of reconstructed videos; demonstrate how the VPS changes over the scale transition in videos; and present the close connection between VPS and high-level action models.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700