Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer

Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, David R. Mortensen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Approaches to improving multilingual language understanding often struggle with significant performance gaps between high-resource and low-resource languages. While there are efforts to align the languages in a single latent space to mitigate such gaps, how different input-level representations influence such gaps has not been investigated, particularly with phonemic inputs. We hypothesize that the performance gaps are affected by representation discrepancies between these languages, and revisit the use of phonemic representations as a means to mitigate these discrepancies. To demonstrate the effectiveness of phonemic representations, we present experiments on three representative cross-lingual tasks on 12 languages in total. The results show that phonemic representations exhibit higher similarities between languages compared to orthographic representations, and it consistently outperforms grapheme-based baseline model on languages that are relatively low-resourced. We present quantitative evidence from three cross-lingual tasks that demonstrate the effectiveness of phonemic representations, and it is further justified by a theoretical analysis of the cross-lingual performance gap.

Original languageEnglish
Title of host publicationMRL 2024 - 4th Workshop on Multilingual Representation Learning, Proceedings of the Workshop
EditorsJonne Saleva, Abraham Owodunni
PublisherAssociation for Computational Linguistics (ACL)
Pages200-211
Number of pages12
ISBN (Electronic)9798891761841
Publication statusPublished - 2024
Event4th Workshop on Multilingual Representation Learning, MRL 2024 - Miami, United States
Duration: 2024 Nov 16 → …

Publication series

NameMRL 2024 - 4th Workshop on Multilingual Representation Learning, Proceedings of the Workshop

Conference

Conference4th Workshop on Multilingual Representation Learning, MRL 2024
Country/TerritoryUnited States
CityMiami
Period24/11/16 → …

Bibliographical note

Publisher Copyright:
©2024 Association for Computational Linguistics.

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer'. Together they form a unique fingerprint.

Cite this