Abstract
We present a multi-scale deep convolutional neural network (CNN) for the task of automatic 2D-to-3D conversion. Traditional methods, which make a virtual view from a reference view, consist of separate stages i.e., depth (or disparity) estimation for the reference image and depth image-based rendering (DIBR) with estimated depth. In contrast, we reformulate the view synthesis task as an image reconstruction problem with a spatial transformer module and directly make stereo image pairs with a unified CNN framework without ground-truth depth as a supervision. We further propose a multi-scale deep architecture to capture the large displacements between images from coarse-level and enhance the detail from fine-level. Experimental results demonstrate the effectiveness of the proposed method over state-of-the-art approaches both qualitatively and quantitatively on the KITTI driving dataset.
Original language | English |
---|---|
Title of host publication | 2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings |
Publisher | IEEE Computer Society |
Pages | 730-734 |
Number of pages | 5 |
ISBN (Electronic) | 9781509021758 |
DOIs | |
Publication status | Published - 2018 Feb 20 |
Event | 24th IEEE International Conference on Image Processing, ICIP 2017 - Beijing, China Duration: 2017 Sept 17 → 2017 Sept 20 |
Publication series
Name | Proceedings - International Conference on Image Processing, ICIP |
---|---|
Volume | 2017-September |
ISSN (Print) | 1522-4880 |
Other
Other | 24th IEEE International Conference on Image Processing, ICIP 2017 |
---|---|
Country/Territory | China |
City | Beijing |
Period | 17/9/17 → 17/9/20 |
Bibliographical note
Funding Information:This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2016R1A2A2A05921659).
Publisher Copyright:
© 2017 IEEE.
All Science Journal Classification (ASJC) codes
- Software
- Computer Vision and Pattern Recognition
- Signal Processing