Term discrimination for text search tasks derived from negative binomial distribution

Lorenz Bernauer, Eun Jin Han, So Young Sohn

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Accurate term discrimination in information retrieval is essential for identifying important terms in specific documents. In addition to the widely known inverse document frequency (IDF) method, alternative approaches such as the residual inverse document frequency (RIDF) scheme have been introduced for term discrimination. However, existing methods’ performance is not unconditionally convincing. We propose a new collection frequency weighting scheme derived from the negative binomial distribution model of term occurrences. Factorial experiments were performed to examine potential interaction effect between collection frequency weight methods and term frequency weight methods according to the mean average precision and normalized discounted cumulative gain performance assessors. The results indicate that our proposed term discrimination method offers a significant gain in accuracy as compared to the IDF and RIDF scheme. This finding is reinforced by the fact that the results show no interaction effects among factors.

Original languageEnglish
Pages (from-to)370-379
Number of pages10
JournalInformation Processing and Management
Volume54
Issue number3
DOIs
Publication statusPublished - 2018 May 1

Bibliographical note

Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) ( 2016R1A2A1A05005270 ).

Publisher Copyright:
© 2018 Elsevier Ltd

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Term discrimination for text search tasks derived from negative binomial distribution'. Together they form a unique fingerprint.

Cite this