TY - GEN
T1 - Automatic extraction for creating a lexical repository of abbreviations in the biomedical literature
AU - Song, Min
AU - Song, Il Yeol
AU - Lee, Ki Jung
PY - 2006
Y1 - 2006
N2 - The sheer volume of biomedical text is growing at an exponential rate. This growth creates challenges for both human readers and automatic text processing algorithms. One such challenge arises from common and uncontrolled usages of abbreviations in the biomedical literature. This, in turn, requires that biomedical lexical ontologies be continuously updated. In this paper, we propose a hybrid approach combining lexical analysis techniques and the Support Vector Machine (SVM) to create an automatically generated and maintained lexicon of abbreviations. The proposed technique is differentiated from others in the following aspects: 1) It incorporates lexical analysis techniques to supervised learning for extracting abbreviations. 2) It makes use of text chunking techniques to identify long forms of abbreviations. 3) It significantly improves Recall compared to other techniques. The experimental results show that our approach outperforms the leading abbreviation algorithms, ExtractAbbrev and ALICE, at least by 6% and 13.9%, respectively, in both Precision and Recall on the Gold Standard Development corpus.
AB - The sheer volume of biomedical text is growing at an exponential rate. This growth creates challenges for both human readers and automatic text processing algorithms. One such challenge arises from common and uncontrolled usages of abbreviations in the biomedical literature. This, in turn, requires that biomedical lexical ontologies be continuously updated. In this paper, we propose a hybrid approach combining lexical analysis techniques and the Support Vector Machine (SVM) to create an automatically generated and maintained lexicon of abbreviations. The proposed technique is differentiated from others in the following aspects: 1) It incorporates lexical analysis techniques to supervised learning for extracting abbreviations. 2) It makes use of text chunking techniques to identify long forms of abbreviations. 3) It significantly improves Recall compared to other techniques. The experimental results show that our approach outperforms the leading abbreviation algorithms, ExtractAbbrev and ALICE, at least by 6% and 13.9%, respectively, in both Precision and Recall on the Gold Standard Development corpus.
UR - http://www.scopus.com/inward/record.url?scp=33751383258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33751383258&partnerID=8YFLogxK
U2 - 10.1007/11823728_37
DO - 10.1007/11823728_37
M3 - Conference contribution
AN - SCOPUS:33751383258
SN - 3540377360
SN - 9783540377368
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 384
EP - 393
BT - Data Warehousing and Knowledge Discovery - 8th International Conference, DaWaK 2006, Proceedings
PB - Springer Verlag
T2 - 8th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2006
Y2 - 4 September 2006 through 8 September 2006
ER -