Automatic extraction for creating a lexical repository of abbreviations in the biomedical literature

Min Song, Il Yeol Song, Ki Jung Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The sheer volume of biomedical text is growing at an exponential rate. This growth creates challenges for both human readers and automatic text processing algorithms. One such challenge arises from common and uncontrolled usages of abbreviations in the biomedical literature. This, in turn, requires that biomedical lexical ontologies be continuously updated. In this paper, we propose a hybrid approach combining lexical analysis techniques and the Support Vector Machine (SVM) to create an automatically generated and maintained lexicon of abbreviations. The proposed technique is differentiated from others in the following aspects: 1) It incorporates lexical analysis techniques to supervised learning for extracting abbreviations. 2) It makes use of text chunking techniques to identify long forms of abbreviations. 3) It significantly improves Recall compared to other techniques. The experimental results show that our approach outperforms the leading abbreviation algorithms, ExtractAbbrev and ALICE, at least by 6% and 13.9%, respectively, in both Precision and Recall on the Gold Standard Development corpus.

Original languageEnglish
Title of host publicationData Warehousing and Knowledge Discovery - 8th International Conference, DaWaK 2006, Proceedings
PublisherSpringer Verlag
Pages384-393
Number of pages10
ISBN (Print)3540377360, 9783540377368
DOIs
Publication statusPublished - 2006
Event8th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2006 - Krakow, Poland
Duration: 2006 Sept 42006 Sept 8

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4081 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2006
Country/TerritoryPoland
CityKrakow
Period06/9/406/9/8

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Automatic extraction for creating a lexical repository of abbreviations in the biomedical literature'. Together they form a unique fingerprint.

Cite this