TY - GEN
T1 - KXtractor
T2 - International Workshop on Bioinformatics Research and Applications, IWBRA 2005
AU - Song, Min
AU - Song, Il Yeol
AU - Hu, Xiaohua
AU - Allen, Robert B.
PY - 2005
Y1 - 2005
N2 - We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.
AB - We present a novel information extraction (IE) technique, KXtractor, which combines a text chunking technique and Mixture Hidden Markov Models (MiHMM). KXtractor overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. KXtractor also resolves issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, the F-measure for KXtractor was higher than for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems.
UR - http://www.scopus.com/inward/record.url?scp=33646187984&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646187984&partnerID=8YFLogxK
U2 - 10.1007/11567752_5
DO - 10.1007/11567752_5
M3 - Conference contribution
AN - SCOPUS:33646187984
SN - 3540294015
SN - 9783540294016
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 68
EP - 81
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Y2 - 22 May 2005 through 24 May 2005
ER -