立体图像客观质量评价与压缩技术研究

英文题名：Research on Objective Quality Evaluation and Coding Algorithm of Stereo Image
作者：杨嘉琛
论文级别：博士
学科专业名称：通信与信息系统
中文关键词：立体图像 ; 视觉模型 ; 立体感 ; 质量评价 ; 压缩编码
英文关键词：stereo image ; visual model ; stereo perceive ; quality evaluation ; coding
学位年度：2009
导师：侯春萍
学科代码：081001
学位授予单位：天津大学
论文提交日期：2008-12-01

摘要

立体图像可以记录真实的三维世界信息,使观众产生身临其境的视觉感受,具有广阔的应用前景。目前,立体图像技术已经成功用于科研、军事、教育、工业、医疗等诸多领域,取得了丰硕的成果。它已经成为当今科学工作者研究的热点问题之一。本文主要从以下几个方面对立体成像技术进行了理论分析与实验研究。
     首先,为了更好的研究立体技术,本文基于现代医学对立体视觉研究、实验的结果,分析了人眼视觉特性,首次提出了一种基于人眼视觉特性的双眼立体视觉模型。通过对模型的研究提取出了一些影响人类立体感的参数,这些参数是研究立体图像客观质量评价与压缩编码的基础与依据。
     第二,为了给立体图像压缩研究提供有效快捷评价手段,本文先用实验证明绝对视差图对立体感的影响,并结合人类生理心理立体视觉、平面图像客观评价与绝对视差图,率先给出一种立体图像客观质量评价方法。该方法从图像质量与立体感两个方面对立体图像进行评价。经实验证明用本文提出客观评价方法对立体图像进行评估,其评价结果与通用主观评价方法得到的评价结果一致性较好。
     第三,本文还提出一种双视点立体图像压缩编码算法。该算法以特征点视差估计与三角网格映射为基础。为了保持压缩后视点对的视差不发生变化,本算法创造性地利用绝对视差图进行特征点选取,实验结果表明解码后的立体图像对立体感较好。进行残差图像编码时,该算法结合了心理立体视觉影响、人眼亮度色度特性、立体图像对色度特点与三角网格映射特点等因素,提出了立体残差编码只需对Y分量进行的方法。实验证明该算法具有一定的优越性,如压缩比大,图像质量与立体感均较好。同时,本文还提出了将双视点立体图像压缩编码扩展为多视点压缩编码算法,并给出了编码方案,并提出了一种立体图像中间视点的生成方法。实验证明,用本文提出的方法可以借助双视点立体图像对得到四视点立体图像组,且立体感几乎没有损失。
     最后,本文开发了基于DM6446处理器的立体电视硬件实验平台用于支持本文所提出方法及算法的验证。该平台除了能为各种立体图像算法的实验验证提供了支持之外,还能解决了立体电视小型化的问题。该平台包含了立体电视所需要的各类硬件接口,最高可以支持每秒60幅分辨率为1600×1200(或1920×1080)的图像输出,输出采用全数字DVI接口,输出速率达1.65Gbps。硬件平台集成了Linux操作系统,实现了所有硬件驱动程序,并移植了X窗口图形桌面。
Stereo image can record the real information of the world and provide more natural visual sensation than the traditional plane image, so it has more bright applicable future. At present, the stereo image technology has been applied to many fields successfully, such as scientific research, military, education, industry, medication and so on. It has become one of the hottest issues that the scientists are studying on. This paper on stereo technology has done the theoretical analysis and experimental research from the following aspects:
     Firstly, to do better research on strereo technology, after analysis of the human visual feature, this paper originally proposes stereo visual model based human visual feature, which is based on the results of the research and experimemnts of stereo visual from the perspective of modern medical science. Then select several parameters that influence the human stereo perceive from the study of the model, which provide as the basis and evidence of the research on the stereo image objective quality assessment and coding.
     Secondly, to persent an effective and convenient assessment approach, this paper demonstrates the significance of the absolute disparity image to the stereo perceive. Combined with human physical and psychological stereo vision, plane image objective evaluation and absolute disparity image, this paper originally proposes a stereo image objective assessment method. This method evaluates stereo image from both the quality of the image and the stereo perceive. Experiments demonstrate that this method performs satisfactorily on the assessment of the stereo image, and the results of the objective assessment show strong correlation with the results of the subjective one.
     Thirdly, this paper also proposes a double-perspective stereo image coding algorithm, based on disparity estimate of feature points and triangle mesh mapping. To maintain the disparity of two view-points after coding, this algorithm originally extracts feather points with the absolute disparity image, and experiments indicate the decoded stereo image make a good stereo perceive. When coding from the residual pictures, this algorithm proposes the method that only conducting stereo residual coding in Y vector based on the influence of the psychological stereo perceive, luminance and tint feature of human eyes, the tint feature of the stereo image pair and feature of the triangle mesh mapping etc. Experiments demonstrate the advantage of the algorithm: the compression ratio is large, and the quality and stereo perceive of the stereo image are good. This paper also extends double-view stereo image coding to multi-view coding algorithm and provides the coding strategy. At the same time, this paper proposes a novel method to generate new stereo image view and the experiments manifest that this method can generate quadri-view stereo image from double-view stereo image, which have strong stereo perceive.
     At last, this paper developes a stereo TV hardware developing experimental platform based DM6446 processor, which provides supported hardware to the stereo image experiment and gives a solution to the utility of stereo TV as well. The platform concludes all the hardware interfaces that stereo TV needed, and can support a maximum output of 60 images with a resolution of 1600×1200(or1920×1080)per second. The output adopts digital DVI interface, and can reach a velocity of 1.65Gbps. The hardware platform integrates Linux operation system, develops all the hardware drive programs, and transplants X windows graphical desktop at the same time.

引文

[1] Masayuki Tanimoto, Toshiaki Fujii and Shigeyuki Sakazawa, "Proposal on standardization of Free Viewpoint TV (FTV) system", ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-T140.doc, July, 2006
    [2]谭军,陆波,余桂丰,立体电视技术的发展概况及基本原理,中国有线电视
    [3]应义财,基于H.264 MVC可分级立体视频压缩编码,天津;天津大学硕士论文,2007
    [4]侯春萍,平面图像立体化技术的研究:天津;天津大学,1998
    [5] Sadeg M. Faris. Novel 3 - D stereoscopic imaging technology. Proceedings SPIE. 1994, 2177(2):78 ? 85
    [6] T. Troscianko, R. Montagnon, J.LeClerc, The role of colour as a monocular depth cue, Vision Research, 1991, 31(11):1923 ? 1930.
    [7] Yutaka Yokoyama, Yoshihiro Miyamoto, Mutsumi Ohta. Very Low Bit ? rate Video Coding with Object ? based Motion Compensation and Orthogonal Transform, Proceedings SPIE. 1993, 2094(11):12 ? 23
    [8] Dodgson N.A., Autostereoscopic 3D Displays, Computer,Aug 2005, Volume 38(Issue 8): 31–36
    [9] Dodgson N.A., Autostereoscopic 3D Displays, Computer Volume 38, Issue 8, Aug. 2005, 31–36
    [10] Cees van Berkela and John A Clarke., CHARACTERISATION AND OPTIMISATION OF 3D-LCD MODULE DESIGN, Philips Research Laboratories, UK, Published Proc SPIE vol 3012, 1997, 179-187
    [11] C van Berkel, A R Franklin, Design and Applications of Multiview 3D-LCD, Philips Research Laboratories, UK,Proc SID Euro-Display96, 1996, 109-112
    [12] Dr Nick Holliman, 3D Display Systems, Department of Computer Science, University of Durham, Science Laboratories, November 8, 2002
    [13] Jeffrey Scott McVeigh, Efficient Compression of Arbitrary Multi-view Video Signals, Carnegie Mellon University, Doctor of Philosophy in Electrical and Computer Engineering, June, 1996 133
    [14] I.Dinstein et al, On stereo image coding. Int. Conf. Pattern Recognition, 1988: 357-359
    [15] H.Yamaguchi et al. Stereoscopic image disparity for predictive coding. Proceedings of ICASSP, 1989: 1976-1979
    [16] E.Salari, W.Whyte, Compression of stereoscopic image data. Data Compression Conference, 1991: 425-431
    [17] M.E.Lukacs. Predictive coding of multi-viewpoint image sets. Prceedings of ICASSP, 1986: 521-524
    [18] H.G.Musmann, Object-oriented analysis-synthesis coding of moving images. Signal Processing: Image Commun, 1989: 117-138
    [19] H.Li, A.Lundmark. Image sequence coding at very low bitrates: a review. IEEE Trans. On image processing, 1994: 589-609
    [20] S.Malsssiotis, M.G.strintzis, Object-Based Coding of Stereo Image Sequences Using Three-Dimensional Models. IEEE Trans. On CSVT, 1997: 892-905
    [21]朱秀昌,刘峰,胡栋.数字图像处理与图像通信.北京:北京邮电大学出版社,2 00 2
    [22] Lukas F. J. and Budrikis Z.L., Picture Quality Prediction Based on a Visual Model, IEEE Trans. Commun., Jul.1982, vol. COM-30, pp.1679-1692
    [23] Daly S., The Visible Differences Predictor: An Algorithm for the Assessment of Image Fidelity, in: Watson A.B.(Ed.), Digital Images and Human Vision, MIT Press, Cambridge, MA, 1993, pp. 179-206
    [24] Foley J.M., Human Luminance Pattern-vision mechanisms: Masking Experiments Require a New Model, Journal of the optical society of America A, June 1994, vol. 11, no. 6, pp. 1710-1719
    [25] Teo P.T., Heeger D.J., Perceptual Image Distortion, Proc. of the SPIE, 1994, vol. 2179, pp. 127-141
    [26] Lambrecht, Perceptual Models and Architectures for Video Coding Applications. PH.D.Thesis, 1996, EPFL, Switzerland
    [27] Winkler S., A Perceptual Distortion Metric for Digital Color Video, Proc. SPIE, Human Vision and Electronic Imaging Conference, January 23-29, 1999, vol.3644, pp175-184, San Jose, California
    [28] Sarnoff Corp. Sarnoff JND Vision Model Algorithm Description and Testing. QEG, Aug. 1997
    [29] Watson A. B. , Toward a perceptual video quality metric , in Human Vision and Electronic Imaging III, Proc. SPIE, 1998, vol. 3299, pp. 139-147
    [30]荆其诚,焦书兰,纪桂萍,人类的视觉,北京:科学出版社,1987年5月第一版,114～134
    [31]张以漠,应用光学,北京:机械工程出版社,1988,11. 35-36,375-385
    [32] Bass M., Handbook of Optics: Fundamentals, Techniques, and Design, 2nd edn, vol. 1, McGraw-Hill. 1995
    [33] Baylor D.A., Photoreceptor signals and vision. Investigative Ophthalmology & Visual Science 28:34–49. 1987
    [34] Stockman A., Sharpe L.T., Spectral sensitivities of the middle- and longwavelength sensitive cones derived from measurements in observers of known genotype.Vision Research 40(13):1711–1737. 2000
    [35] Kuffler S.W., Discharge pattern and functional organisation of mammalian retina. Journal of Neurophysiology 16:37–68. 1953
    [36] Sekuler R., Blake R., Perception, 2nd edn, McGraw-Hill. 1990
    [37]郑竺英,双眼立体视觉的信息加工,科学出版社,北京:1998年8月第一版
    [38] Owen M. , Thomas, Bruce G. Cumming and Andrew J. Parker, A specialization for relative disparity in V2, nature neuroscience, volume 5 no 5, may 2002
    [39] Lenny Lipton. The Cinema in Depth: From Hollywood to Super 3D. Van Nostrand Reinhold Company Inc., 1982
    [40] Richard Hartley, Multiple View Geometry in Computer Vision, Cambridge University Press, Second Edition 2003
    [41] Michael A. Sutton, Stephen R. McNeill, Jeffrey D. Helm, Advances in Two-Dimensional and Three-Dimensional Computer Vision, P. K. Rastogi (Ed.): Photomechanics, Topics Appl. Phys. 77, 2000, 323–372
    [42] Mubarak Shah, Fundamentals of computer vision, Computer Science Department University of Central Florida Orlando, December 1997
    [43] Josef Bigun, Vision with Direction, A Systematic Introduction to Image Processing and Computer Vision, Springer-Verlag Berlin Heidelberg 2006
    [44]劳丽娟,静止立体图像的理论分析与实验研究,天津;天津大学硕士论文,2008
    [45] Antonio Medina Puerta, The Power of Shadows: Shadow Stereopsis, Optical Society of America, 1989, 6(2): 309–380
    [46] Winkler S.,Vision Models and Quality Metrics for Image Processing Applications. Germany: Diplom-lngenieur derElektrotechnik, Technische Universitat Wien de nationalitéautrichienne, 2000-12-21
    [47] Hering, E.,Zur Lehre vom Lichtsinne, Carl Gerolds. 1878
    [48] Michelson A. A., Studies in Optics, University of Chicago Press, 1927, Chicago
    [49] Yu Z., Wu H.R., Winkler S., Vision model based impairment metric to evaluate blocking artifacts in digital video, Proceedings of the IEEE, January 2002, vol. 90, no. 1, pp. 154-169
    [50] Campbell F. W., Robson J. G., Application of Fourier analysis to the visibility of gratings, Journal of Physiology, 1968, vol.197, pp.551-556
    [51] Campbell F. W., Kulikowski J. J., The effect of orientation on the visual resolution of gratings, Journal of Physiology, 1966,vol.187, pp. 427-436
    [52] Van Nes F. L., Koenderink J. J., Spatiotemporal modulation transfer in the human eye, J. Opt. Soc. Am., Sep. 1967,vol.57, no.9, pp.1082-1089
    [53] Watanabe A., Spatial sine-wave responses of the human visual system, Vision Research, 1968, vol.8, pp.1245-1263
    [54] Zeng W., Daly S., Lei S., An overview of the visual optimization tools in JPEG 2000,Signal Processing: Image Communication, 2002, vol. 17, no. 1, pp. 85-104
    [55] Barlow H. B., Temporal and spatial summation in human vision at different background intensities, Journal of Physiology, 1958, vol.141, pp. 337-350
    [56] Valois R.L.D. et al, Spatial Frequency Selectivity of Cells in Macaque Visual Cortex, Vision Reserch, 1982, vol. 22, no. 5, pp. 545-599
    [57] Philips G.C., Wilson H.R., Orientation Bandwidth of Spatial Mechanisms Measured by Masking, J.Opt. Soc. Am. A, 1984, vol. 1, no. 2, pp. 226-232
    [58] Losada M.A., Mullen K.T., The Spatial Tuning of Chromatic Mechanisms Identified by Simultaneous Masking, Vision Reserch, 1994, vol. 34, no. 3, pp. 331-341
    [59] Losada M.A., Mullen K.T., Color and Luminance Spatial Tuning Estimated by Noise Masking in the Absence of Off-frequency Looking, J. Opt. Soc. Am. A., 1995, vol.12, no. 2, pp. 250-260
    [60] Vimal R.L.P, Orientation Tuning of the Spatial-Orientation Mechanisms of the Red-Green Channel, M.Opt. Soc. Am. A, 1997, vol. 14, no. 10, pp. 2622-2632
    [61] Webster M.A. et al., Orientation and Spatial-Frequency Discrimination for Luminance and Chromatic Gratings, J. Opt. Soc. Am. A, 1990, vol. 7, no. 6, pp. 1034-1049
    [62] Burt P,Adelson E.The Laplacian Pyramid as a Compact Image Code.IEEE Trans on Communications,1983,31(4):532—540
    [63] Winkler Stefan, Digital Video Quality Vision Models and Metrics, Genista Corporation, Montreux, Switzerland, John Wiley & Sons Ltd, 2005
    [64]郑竺英,双眼立体视觉的信息加工,北京,科学出版社,1998
    [65] Blakemore C.,Binocular depth perception and the optic chiasm, Vision Res.,10:43—47,1970
    [66] Mitchell D.E.,Blakemore C., Binocular Depth Perception and the corpus callosum. Vision Res., 10:49—54 1970
    [67] Barlow H.B., Blakemore C., Pettigrew J.D., The neural mechanism of binocular depth discrimination. J. Physiol. 1967,327-342
    [68] Hubel D.H., Wiesel T.N., cells sensitive to binocular depth in area 18 of the macaque monkey cortex. Nature. 1970, 41-42.
    [69] Pettigrew J.D., Binocular visual processing in the owl telencephelon, Proc.Royal. Society of London,1979, 435-454
    [70] Fischer B., Poggio G.F., Depth sensitivity of binocular cortical neurons of behaving monkeys, Proc.Royal. Society of London,1979,409-414
    [71] Poggio G.F., The Analysis of stereopsis, Ann.Rev 1984,379-412.
    [72] Poynton C.A.,A Technical Introduction to Digital Video, John Wiley.1996
    [73] Hunt R. W. G.,The Reproduction of Colour, 5th edn, Fountain Press. 1995
    [74] Poirson A.B., Wandell B. A.,Appearance of colored patterns: Pattern-color separability. Journal of the Optical Society of America A 10(12):2458–2470.1993
    [75] Winkler Stefan, Vision Models and Quality Metrics for Image Processing Applications, PHD dissertation, Swiss Federal Institute of Technology, 2000
    [76] Watson A.B., DCT quantization matrices visually optimized for individual images, in Human Vision, Visual Processing, and Digital Display IV, J.P. Allebach and B.E.Rogowitz, eds., Proc. SPIE 2179, 1993, pp. 202-216
    [77] Watson A.B., A model of Visual Contrast Gain Control and Pattern Masking, Journal of the Optical Society of America A, 1997, vol. 14, no. 9, pp.2379-2391
    [78] Zhou Wang, Alan C. Bovik,Ligang Lu,Why is Image Quality Asscssment SO Difficult,ICASSP02,13-17,May,2002,PP3313-3316
    [79] Zhou Wang, Ligang Lu,Alan C.Bovik, video Quality Assessment Based on Structural Distortion Measurement,Signal Processing: Image Communjcation,Vol.19,No.2,Feb.2004,pp121-132
    [80] Zhou Wang,Alan C. Bovik,A Universal Image Quality Index,IEEE Signal Processing,Vol.9,No.3,March2002,pp81-84
    [81]杨春玲,陈冠豪,谢胜利,基于梯度信息的图像质量评判方法的研究,电子学报,2007,1313-1317
    [82] Pieter J.H.,Seuntiens, Visual Experience of 3D TV, Technische Universiteit Eindhoven, 2006
    [83] W.A. IJsselsteijn, P.J.H. Seuntens, L.M.J. Meesters,State-of-the-art in human factors and quality issues of stereoscopic broadcast television, Deliverable ATTEST/WP5/01, Aug. 2002
    [84] ITU-R Rec. BT. 500-9, Methodology for the subjective assessment of the quality of television pictures, ITU, Geneva, Switzerland ,1998
    [85] Aydmoglu H, Hayes M H. Stereo image coding: A projection approach[C]. IEEE Trans. On Image Processing, 1998: 506~516
    [86]安平,立体视频视差估计及编码研究,上海;上海大学博士论文,2005
    [87] Y.Nakaya, H.Harashima, Motion compensation based on spatial transformations, IEEE Trans. Circuits Syst. Video Technol.,Vol. 4,1994: 339-356
    [88] N.P.Weatherill, Delaunay triangulation in computational fluid dynamics[J]. Computers Math, Applic, Vol.24, 1992: 129-150
    [89]李岚,基于Delauany三角形网格的图象编码方法研究,宁波:宁波大学硕士学位论文,2005
    [90]朱桂斌,张邦礼,吴乐华,胡中豫,基于Delaunay三角剖分的图象变形技术研究,中国图像图形学报,Vol.8(A).No.6, Jun.2003
    [91]岳宾,立体图像压缩算法研究,天津;天津大学硕士论文,2008
    [92]武晓波,王世新,肖春生,Delaunay三角网的生成算法研究,测绘学报,Vol,28,No.1, Feb, 1999
    [93] ISO/IECJTC1/SC29/WG11, Mpeg-4overview, Roma, In: Proc 46th MPEG Meeting.N2564, 1998
    [94] Jens-Rainer Ohm, Karsten Gruneberg, A realtime hardware system for stereoscopic videoconferencing with viewpoint adaptation, Signal Processing Image Connunication, 1998: 147-171
    [95]俞斯乐,侯正信,冯启明,李文元,电视原理(第5版),国防工业出版社,2000,4
    [96]韩军功立体图像和视频编码的理论及算法研究,西安,西安电子科技大学博士论文,2004
    [97]郑灵翔等,嵌入式系统设计与应用开发,北京:北京航空航天大学出版社,2006,1~15
    [98] TEXAS INSTRUMENTS , TMS320DM6446 Digital Media System-on-Chip,2006-01
    [99] TEXAS INSTRUMENTS,TMS320DM644x DMSoC Peripherals Overview Reference Guide,2006-01
    [100]徐鹏,邹浩斌,达芬奇技术简化数字视频设计,世界电子元器件,2006-02:49~53
    [101] TEXAS INSTRUMENTS,TMS320C644x DMSOC DSP Subsystem Reference,SPRUE15,2005
    [102] TEXAS INSTRUMENTS,TMS320C64x+ DSP Megamodule Reference Guide,SPRUE871,2005
    [103] TEXAS INSTRUMENTS,TFP410 TI PanelBus Digital Transmitter,SLDS145A,2001
    [104]张彦龙,基于DM6446的立体图像显示系统和视频捕捉系统的设计与实现,天津;天津大学硕士论文,2007
    [105]郑灵翔,嵌入式系统设计与应用开发,北京:北京航空航天大学出版社,2006,168~190
    [106]何曦,X窗口程序设计,四川:电子科技大学,1993.20~26