TY - GEN
T1 - Detecting duplicate biological entities using markov random field-based edit distance
AU - Song, Min
AU - Rudniy, Alex
PY - 2008
Y1 - 2008
N2 - Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.
AB - Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks.
UR - http://www.scopus.com/inward/record.url?scp=58049138671&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58049138671&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2008.34
DO - 10.1109/BIBM.2008.34
M3 - Conference contribution
AN - SCOPUS:58049138671
SN - 9780769534527
T3 - Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
SP - 457
EP - 460
BT - Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
T2 - 2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008
Y2 - 3 November 2008 through 5 November 2008
ER -