A Methodology Combining Cosine Similarity with Classifier for Text Classification

Kwangil Park, June Seok Hong, Wooju Kim

Research output: Contribution to journalArticlepeer-review

51 Citations (Scopus)

Abstract

Text Classification has received significant attention in recent years because of the proliferation of digital documents and is widely used in various applications such as filtering and recommendation. Consequently, many approaches, including those based on statistical theory, machine learning, and classifier performance improvement, have been proposed for improving text classification performance. Among these approaches, centroid-based classifier, multinomial naïve bayesian (MNB), support vector machines (SVM), convolutional neural network (CNN) are commonly used. In this paper, we introduce a cosine similarity-based methodology for improving performance. The methodology combines cosine similarity (between a test document and fixed categories) with conventional classifiers such as MNB, SVM, and CNN to improve the accuracy of the classifiers, and then we call the conventional classifiers with cosine similarity as enhanced classifiers. We applied the enhanced classifiers to famous datasets–20NG, R8, R52, Cade12, and WebKB–and evaluated the performance of the enhanced classifiers in terms of the confusion matrix’s accuracy; we obtained outstanding results in that the enhanced classifiers show significant increases in accuracy. Moreover, through experiments, we identified which of two considered knowledge representation techniques (word count and term frequency-inverse document frequency (TFIDF)) is more suitable in terms of classifier performance.

Original languageEnglish
Pages (from-to)396-411
Number of pages16
JournalApplied Artificial Intelligence
Volume34
Issue number5
DOIs
Publication statusPublished - 2020 Apr 15

Bibliographical note

Publisher Copyright:
© 2020, © 2020 Taylor & Francis.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A Methodology Combining Cosine Similarity with Classifier for Text Classification'. Together they form a unique fingerprint.

Cite this