OCEAN: Object-centric arranging network for self-supervised visual representations learning

Changjae Oh, Bumsub Ham, Hansung Kim, Adrian Hilton, Kwanghoon Sohn

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


Learning visual representations plays an important role in computer vision and machine learning applications. It facilitates a model to understand and perform high-level tasks intelligently. A common approach for learning visual representations is supervised one which requires a huge amount of human annotations to train the model. This paper presents a self-supervised approach which learns visual representations from input images without human annotations. We learn the correct arrangement of object proposals to represent an image using a convolutional neural network (CNN) without any manual annotations. We hypothesize that the network trained for solving this problem requires the embedding of semantic visual representations. Unlike existing approaches that use uniformly sampled patches, we relate object proposals that contain prominent objects and object parts. More specifically, we discover the representation that considers overlap, inclusion, and exclusion relationship of proposals as well as their relative position. This allows focusing on potential objects and parts rather than on clutter. We demonstrate that our model outperforms existing self-supervised learning methods and can be used as a generic feature extractor by applying it to object detection, classification, action recognition, image retrieval, and semantic matching tasks.

Original languageEnglish
Pages (from-to)281-292
Number of pages12
JournalExpert Systems with Applications
Publication statusPublished - 2019 Jul 1

Bibliographical note

Publisher Copyright:
© 2019 Elsevier Ltd

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'OCEAN: Object-centric arranging network for self-supervised visual representations learning'. Together they form a unique fingerprint.

Cite this