OoMMix: Out-of-manifold regularization in contextual embedding space for text classification

Seonghyeon Lee, Dongha Lee, Hwanjo Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Recent studies on neural networks with pre-trained weights (i.e., BERT) have mainly focused on a low-dimensional subspace, where the embedding vectors computed from input words (or their contexts) are located. In this work, we propose a new approach, called OoMMix, to finding and regularizing the remainder of the space, referred to as out-of-manifold, which cannot be accessed through the words. Specifically, we synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words, to utilize them for fine-tuning the network. A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold by the discriminator. These two modules successfully collaborate in a unified and end-to-end manner for regularizing the out-of-manifold. Our extensive evaluation on various text classification benchmarks demonstrates the effectiveness of our approach, as well as its good compatibility with existing data augmentation techniques which aim to enhance the manifold.

Original languageEnglish
Title of host publicationACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages590-599
Number of pages10
ISBN (Electronic)9781954085527
Publication statusPublished - 2021
EventJoint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 - Virtual, Online
Duration: 2021 Aug 12021 Aug 6

Publication series

NameACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference

Conference

ConferenceJoint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021
CityVirtual, Online
Period21/8/121/8/6

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

All Science Journal Classification (ASJC) codes

  • Software
  • Computational Theory and Mathematics
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'OoMMix: Out-of-manifold regularization in contextual embedding space for text classification'. Together they form a unique fingerprint.

Cite this