Scalable entity matching computation with materialization

Sanghoon Lee, Jongwuk Lee, Seung Won Hwang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Entity matching (EM) is the task of identifying records that refer to the same real-world entity from different data sources. While EM is widely used in data integration and data cleaning applications, the naive method for EM incurs quadratic cost with respect to the size of the datasets. To address this problem, this paper proposes a scalable EM algorithm that employs a pre-materialized structure. Specifically, once the structure is built, our proposed algorithm can identify the EM results with sub-linear cost. In addition, as the rules evolve, our algorithm can efficiently adapt to new rules by selectively accessing records using the materialized structure. Our evaluation results show that our proposed EM algorithm is significantly faster than the state-of-the-art method for extensive real-life datasets.

Original languageEnglish
Title of host publicationCIKM'11 - Proceedings of the 2011 ACM International Conference on Information and Knowledge Management
Pages2353-2356
Number of pages4
DOIs
Publication statusPublished - 2011
Event20th ACM Conference on Information and Knowledge Management, CIKM'11 - Glasgow, United Kingdom
Duration: 2011 Oct 242011 Oct 28

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other20th ACM Conference on Information and Knowledge Management, CIKM'11
Country/TerritoryUnited Kingdom
CityGlasgow
Period11/10/2411/10/28

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Fingerprint

Dive into the research topics of 'Scalable entity matching computation with materialization'. Together they form a unique fingerprint.

Cite this