Abstract
Implicit hate speech detection is a challenging task in text classification since no explicit cues (e.g., swear words) exist in the text. While some pre-trained language models have been developed for hate speech detection, they are not specialized in implicit hate speech. Recently, an implicit hate speech dataset with a massive number of samples has been proposed by controlling machine generation. We propose a pre-training approach, CONPROMPT, to fully leverage such machine-generated data. Specifically, given a machine-generated statement, we use example statements of its origin prompt as positive samples for contrastive learning. Through pre-training with CONPROMPT, we present TOXIGEN-CONPROMPT, a pre-trained language model for implicit hate speech detection. We conduct extensive experiments on several implicit hate speech datasets and show the superior generalization ability of TOXIGEN-CONPROMPT compared to other pre-trained models. Additionally, we empirically show that CONPROMPT is effective in mitigating identity term bias, demonstrating that it not only makes a model more generalizable but also reduces unintended bias. We analyze the representation quality of TOXIGEN-CONPROMPT and show its ability to consider target group and toxicity, which are desirable features in terms of implicit hate speeches.
| Original language | English |
|---|---|
| Title of host publication | Findings of the Association for Computational Linguistics |
| Subtitle of host publication | EMNLP 2023 |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 10964-10980 |
| Number of pages | 17 |
| ISBN (Electronic) | 9798891760615 |
| DOIs | |
| Publication status | Published - 2023 |
| Event | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Hybrid, Singapore Duration: 2023 Dec 6 → 2023 Dec 10 |
Publication series
| Name | Findings of the Association for Computational Linguistics: EMNLP 2023 |
|---|
Conference
| Conference | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 |
|---|---|
| Country/Territory | Singapore |
| City | Hybrid |
| Period | 23/12/6 → 23/12/10 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
All Science Journal Classification (ASJC) codes
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems
- Language and Linguistics
- Linguistics and Language
Fingerprint
Dive into the research topics of 'CONPROMPT: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver