Bootstrapping information extraction via conceptualization

Jiaqing Liang, Suo Feng, Chenhao Xie, Yanghua Xiao, Jindong Chen, Seung Won Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)


Bootstrapping enables us to use existing knowledge to find patterns and extract new knowledge from free texts, from which more patterns can be found. Due to its minimally supervised, domain-independent, and language-independent nature, it has been widely adopted in real-world applications. However, as iterations go on, semantic drift may happen. The extraction may shift from the target class to other classes and result in errors, which propagate in the succeeding iterations and hurt the performance significantly. Existing solutions simply throw away bad patterns, sacrificing recall to ensure high precision. However, we argue that most of these patterns and instances can be kept as long as being applied selectively, guided by prior knowledge. In this paper, we propose a pattern-based extraction framework with three distinguished features: (1) it uses conceptual taxonomies to guide the extraction to reduce semantic drift; (2) it uses the knowledge of existing triples to improve the precision; (3) it integrates all patterns to form a generalized pattern set with quantified confidence measurement. The proposed solution is applied on enriching two real-world knowledge bases and achieves higher precision and recall compared to existing solutions.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PublisherIEEE Computer Society
Number of pages12
ISBN (Electronic)9781728191843
Publication statusPublished - 2021 Apr
Event37th IEEE International Conference on Data Engineering, ICDE 2021 - Virtual, Chania, Greece
Duration: 2021 Apr 192021 Apr 22

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627


Conference37th IEEE International Conference on Data Engineering, ICDE 2021
CityVirtual, Chania

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems


Dive into the research topics of 'Bootstrapping information extraction via conceptualization'. Together they form a unique fingerprint.

Cite this