Investigation into the existence of the indexer effect in key phrase extraction

Jung Eun Hahm, Su Yeon Kim, Meen Chul Kim, Min Song

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


Introduction. The indexer effect has been studied in several research studies in the field of information science to reveal intellectual structures. In this study, we bring that concept into document classification to verify whether it also influences the results in key phrase extraction. Method. We employ the well-known key phrase extraction technique called the key phrase extraction algorithm for our study. In particular, we extract key phrases from three different datasets: 1) papers in the same journal, 2) papers from different journals in the same field, and 3) papers from journals in different fields. All of these datasets provide keywords and index terms which we used as training data for the algorithm. Analysis. For evaluation, we compare the difference in the performance of key phrases between two groups of key phrases that were extracted using the algorithm: 1) those that used author-provided keywords for the training set, and 2) those that used indexer-assigned index terms for the training set. We analyse those two groups of extracted key phrases in terms of exact (100%) and fair (70%) matching, which is based on the average number of key phrases extracted correctly per document. Results. We conclude that automatic key phrase extraction based on index terms performs better than its counterpart based on author-provided keywords in most cases. However, it also reveals that indexers tend to assign terms more inconsistently. Conclusions. The findings of the study provide some insights into making use of index terms as training data in key phrase extraction. On the other hand, it should be also noted that automatically extracted key phrases might lead users to irrelevant documents in information retrieval.

Original languageEnglish
JournalInformation Research
Issue number4
Publication statusPublished - 2013

All Science Journal Classification (ASJC) codes

  • Library and Information Sciences


Dive into the research topics of 'Investigation into the existence of the indexer effect in key phrase extraction'. Together they form a unique fingerprint.

Cite this