Biomedical text categorization with concept graph representations using a controlled vocabulary

Meenakshi Mishra, Jun Huan, Said Bleik, Min Song

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Recent work using graph representations for text categorization has shown promising performance over conventional bag-of-words representation of text documents. In this paper we investigate a graph representation of texts for the task of text categorization. In our representation we identify high level concepts extracted from a database of controlled biomedical terms and build a rich graph structure that contains important concepts and relationships. This procedure ensures that graphs are described with a regular vocabulary, leading to increased ease of comparison. We then classify document graphs by applying a set-based graph kernel that is intuitively sensible and able to deal with the disconnectedness of the constructed concept graphs. We compare this approach to standard approaches using non-graph, text-based features. We also do a comparison amongst different kernels that can be used to see which performs better.

Original languageEnglish
Title of host publicationProc. of the 11th Int. Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining, SIGKDD'12
Pages26-32
Number of pages7
DOIs
Publication statusPublished - 2012
Event11th International Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD'12 - Beijing, China
Duration: 2012 Aug 122012 Aug 12

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other11th International Workshop on Data Mining in Bioinformatics, BIOKDD 2012 - Held in Conjunction with the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD'12
Country/TerritoryChina
CityBeijing
Period12/8/1212/8/12

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Biomedical text categorization with concept graph representations using a controlled vocabulary'. Together they form a unique fingerprint.

Cite this