用户名: 密码: 验证码:
Unsupervised Feature Learning for Dense Correspondences Across Scenes
详细信息    查看全文
  • 作者:Chao Zhang ; Chunhua Shen ; Tingzhi Shen
  • 关键词:Unsupervised feature learning ; Scene alignment ; Dense scene correspondence ; Loopy belief propagation
  • 刊名:International Journal of Computer Vision
  • 出版年:2016
  • 出版时间:January 2016
  • 年:2016
  • 卷:116
  • 期:1
  • 页码:90-107
  • 全文大小:12,683 KB
  • 参考文献:Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.CrossRef
    Barnes, C., Shechtman, E., Goldman, D., & Finkelstein, A. (2010). The generalized PatchMatch correspondence algorithm. In: Proceedings of the European Conference on Computer Vision. New York: Springer
    Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision and Image Understanding, 110(3), 346–359.CrossRef
    Berg, A. C., Berg, T., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (vol. 1, pp. 26–33). IEEE
    Bo, L., Ren, X., & Fox, D. (2013). Multipath sparse coding using hierarchical matching pursuit. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 660–667). IEEE
    Boureau, Y. -L., Ponce, J., & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the International Conference on Machine Learning (pp. 111–118).
    Coates, A., & Ng, A. Y. (2011a). Selecting receptive fields in deep networks. In: Proceedings of the Advances in Neural Information Processing Systems (pp. 2528–2536).
    Coates, A., & Ng, A. (2011b). The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the International Conference on Machine Learning (pp. 921–928).
    Coates, A., Ng, A. Y., & Lee, H. (2011). An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (pp. 215–223).
    Duchenne, O., Joulin, A., & Ponce, J. (2011). A graph-matching kernel for object categorization. In: Proceedings of the IEEE International Conference Computer Vision (ICCV) (pp. 1792–1799). IEEE
    Everingham, M., Eslami, S., Van Gool, L., Williams, C., Winn, J., & Zisserman, A. (2014). The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision. [Online] doi:10.​1007/​s11263-014-0733-5 .
    Felzenszwalb, P., & Huttenlocher, D. (2006). Efficient belief propagation for early vision. International Journal of Computer Vision, 70(1), 41–54. [Online] doi:10.​1007/​s11263-006-7899-4 .
    Heikkilä, M., Pietikäinen, M., & Schmid, C. (2009). Description of interest regions with local binary patterns. Pattern Recognition, 42(3), 425–436.MATH CrossRef
    Huang, G. B., Mattar, M., Lee, H., & Learned-Miller, E. G. (2012) Learning to align from scratch. In: Proceedings of the Advances in Neural Information Processing Systems.
    Ihler, A. T., IiI, J. W. F., & Willsky, A. S. (2005). Loopy belief propagation: Convergence and effects of message errors. Journal of Machine Learning Research, 6, 905–936.
    Kim, J., Liu, C., Sha, F., & Grauman, K. (2013). Deformable spatial pyramid matching for fast dense correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2307–2314). IEEE
    Korman, S., & Avidan, S. (2011). Coherency sensitive hashing. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), (pp. 1607–1614). IEEE
    Le, Q. V., Karpenko, A., Ngiam, J., & Ng, A. Y. (2011). ICA with reconstruction cost for efficient overcomplete feature learning. In: Proceedings of the Advances in Neural Information Processing Systems (pp. 1017–1025).
    Le, Q. V., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., & Ng, A. Y. (2012). Building high-level features using large scale unsupervised learning. In: Proceedings of the International Conference on Machine Learning
    Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.CrossRef
    Leordeanu, M., & Hebert, M. (2005). A spectral technique for correspondence problems using pairwise constraints. In: Proceedings of the IEEE International Conference Computer Vision (ICCV), (vol. 2). IEEE
    Leordeanu, M., Zanfir, A., & Sminchisescu, C. (2013). Locally affine sparse-to-dense matching for motion and occlusion estimation. In: Proceedings of the IEEE International Conference Computer Vision (ICCV). IEEE
    Liu, C., Yuen, J., & Torralba, A. (2009). Nonparametric scene parsing: Label transfer via dense scene alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1972–1979). IEEE
    Liu, C., Yuen, J., & Torralba, A. (2011). SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions Pattern Analysis and Machine Intelligence, 33(5), 978–994.CrossRef
    Liu, L., Wang, L., & Liu, X. (2011). In defense of soft-assignment coding. In: Proceedings of the IEEE International Conference Computer Vision (ICCV) (pp. 2486–2493). IEEE
    Lowe, D. G. (1999). Object recognition from local scale-invariant features. In: Proceedings of the IEEE International Conference Computer Vision (ICCV). IEEE
    Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19–60.
    Rubinstein, M., Joulin, A., Kopf, J., & Liu, C. (2013). Unsupervised joint object discovery and segmentation in internet images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1939–1946).
    Tola, E., Lepetit, V., & Fua, P. (2008). A fast local descriptor for dense matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
    Tola, E., Lepetit, V., & Fua, P. (2010). DAISY: An efficient dense descriptor applied to wide baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5), 815–830.CrossRef
    Tuytelaars, T., & Van Gool, L. (2000). Wide baseline stereo matching based on local, affinely invariant regions. In: Proceedings of the British Machine Conference (pp. 412–425).
    Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.CrossRef
    Zelnik-Manor, L. (2012). On SIFTs and their scales. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1522–1528). IEEE
    Zhou, G., Sohn, K., & Lee, H. (2012). Online incremental feature learning with denoising autoencoders. In: International Conference on Artificial Intelligence and Statistics, (vol. 22, pp. 1453–1461).
  • 作者单位:Chao Zhang (1) (2)
    Chunhua Shen (2) (3)
    Tingzhi Shen (1)

    1. Beijing Institute of Technology, Beijing, 100081, China
    2. The University of Adelaide, Adelaide, SA, 5005, Australia
    3. Australian Centre for Robotic Vision, Brisbane, Australia
  • 刊物类别:Computer Science
  • 刊物主题:Computer Imaging, Vision, Pattern Recognition and Graphics
    Artificial Intelligence and Robotics
    Image Processing and Computer Vision
    Pattern Recognition
  • 出版者:Springer Netherlands
  • ISSN:1573-1405
文摘
We propose a fast, accurate matching method for estimating dense pixel correspondences across scenes. It is a challenging problem to estimate dense pixel correspondences between images depicting different scenes or instances of the same object category. While most such matching methods rely on hand-crafted features such as SIFT, we learn features from a large amount of unlabeled image patches using unsupervised learning. Pixel-layer features are obtained by encoding over the dictionary, followed by spatial pooling to obtain patch-layer features. The learned features are then seamlessly embedded into a multi-layer matching framework. We experimentally demonstrate that the learned features, together with our matching model, outperform state-of-the-art methods such as the SIFT flow (Liu et al. in IEEE Trans Pattern Anal Mach Intell 33(5):978–994, 2011), coherency sensitive hashing (Korman and Avidan in: Proceedings of the IEEE international conference on computer vision (ICCV), 2011) and the recent deformable spatial pyramid matching (Kim et al. in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013) methods both in terms of accuracy and computation efficiency. Furthermore, we evaluate the performance of a few different dictionary learning and feature encoding methods in the proposed pixel correspondence estimation framework, and analyze the impact of dictionary learning and feature encoding with respect to the final matching performance. Keywords Unsupervised feature learning Scene alignment Dense scene correspondence Loopy belief propagation

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700