TY - GEN
T1 - Biomedical text categorization with concept graph representations using a controlled vocabulary
AU - Mishra, Meenakshi
AU - Huan, Jun
AU - Bleik, Said
AU - Song, Min
PY - 2012
Y1 - 2012
N2 - Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.
AB - Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.
UR - http://www.scopus.com/inward/record.url?scp=84866635017&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866635017&partnerID=8YFLogxK
U2 - 10.1145/2350176.2350181
DO - 10.1145/2350176.2350181
M3 - Conference contribution
AN - SCOPUS:84866635017
SN - 9781450315524
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 26
EP - 32
BT - Proc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12
T2 - 11th International Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD'12
Y2 - 12 August 2012 through 12 August 2012
ER -