Abstract
Bootstrapping enables us to use existing knowledge to find patterns and extract new knowledge from free texts, from which more patterns can be found. Due to its minimally supervised, domain-independent, and language-independent nature, it has been widely adopted in real-world applications. However, as iterations go on, semantic drift may happen. The extraction may shift from the target class to other classes and result in errors, which propagate in the succeeding iterations and hurt the performance significantly. Existing solutions simply throw away bad patterns, sacrificing recall to ensure high precision. However, we argue that most of these patterns and instances can be kept as long as being applied selectively, guided by prior knowledge. In this paper, we propose a pattern-based extraction framework with three distinguished features: (1) it uses conceptual taxonomies to guide the extraction to reduce semantic drift; (2) it uses the knowledge of existing triples to improve the precision; (3) it integrates all patterns to form a generalized pattern set with quantified confidence measurement. The proposed solution is applied on enriching two real-world knowledge bases and achieves higher precision and recall compared to existing solutions.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021 |
Publisher | IEEE Computer Society |
Pages | 49-60 |
Number of pages | 12 |
ISBN (Electronic) | 9781728191843 |
DOIs | |
Publication status | Published - 2021 Apr |
Event | 37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece Duration: 2021 Apr 19 → 2021 Apr 22 |
Publication series
Name | Proceedings - International Conference on Data Engineering |
---|---|
Volume | 2021-April |
ISSN (Print) | 1084-4627 |
Conference
Conference | 37th IEEE International Conference on Data Engineering, ICDE 2021 |
---|---|
Country/Territory | Greece |
City | Virtual, Chania |
Period | 21/4/19 → 21/4/22 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Information Systems