Cylindrical convolutional networks for joint object detection and viewpoint estimation

Sunghun Joung, Seungryong Kim, Hanjae Kim, Minsu Kim, Ig Jae Kim, Junghyun Cho, Kwanghoon Sohn

Research output: Contribution to journalConference articlepeer-review

12 Citations (Scopus)

Abstract

Existing techniques to encode spatial invariance within deep convolutional neural networks only model 2D transformation fields. This does not account for the fact that objects in a 2D space are a projection of 3D ones, and thus they have limited ability to severe object viewpoint changes. To overcome this limitation, we introduce a learnable module, cylindrical convolutional networks (CCNs), that exploit cylindrical representation of a convolutional kernel defined in the 3D space. CCNs extract a view-specific feature through a view-specific convolutional kernel to predict object category scores at each viewpoint. With the view-specific feature, we simultaneously determine objective category and viewpoints using the proposed sinusoidal soft-argmax module. Our experiments demonstrate the effectiveness of the cylindrical convolutional networks on joint object detection and viewpoint estimation.

Original languageEnglish
Article number9156810
Pages (from-to)14151-14160
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
Publication statusPublished - 2020
Event2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States
Duration: 2020 Jun 142020 Jun 19

Bibliographical note

Funding Information:
This research was supported by R&D program for Advanced Integrated-intelligence for Identification (AIID) through the National Research Foundation of KOREA (NRF) funded by Ministry of Science and ICT (NRF-2018M3E3A1057289). ∗Corresponding author

Publisher Copyright:
© 2020 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Cylindrical convolutional networks for joint object detection and viewpoint estimation'. Together they form a unique fingerprint.

Cite this