Abstract
A latent Dirichlet allocation (LDA) model is a machine learning technique to identify latent topics from text corpora within a Bayesian hierarchical framework. Current popular inferential methods to fit the LDA model are based on variational Bayesian inference, collapsed Gibbs sampling, or a combination of these. Because these methods assume a unimodal distribution over topics, however, they can suffer from large bias when text corpora consist of various clusters with different topic distributions. This paper proposes an inferential LDA method to efficiently obtain unbiased estimates under flexible modeling for heterogeneous text corpora with the method of partial collapse and the Dirichlet process mixtures. The method is illustrated using a simulation study and an application to a corpus of 1300 documents from neural information processing systems (NIPS) conference articles during the period of 2000–2002 and British Broadcasting Corporation (BBC) news articles during the period of 2004–2005.
Original language | English |
---|---|
Pages (from-to) | 208-218 |
Number of pages | 11 |
Journal | Expert Systems with Applications |
Volume | 131 |
DOIs | |
Publication status | Published - 2019 Oct 1 |
Bibliographical note
Funding Information:T. Park’s research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education ( NRF-2017R1D1A1B03033536 ). Y.-S. Lee’s research was supported by the Korea Meteorological Administration Research and Development Program under Grant KMIPA 2015-1110 .
Funding Information:
T. Park's research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2017R1D1A1B03033536). Y.-S. Lee's research was supported by the Korea Meteorological Administration Research and Development Program under Grant KMIPA 2015-1110.
Publisher Copyright:
© 2019 Elsevier Ltd
All Science Journal Classification (ASJC) codes
- Engineering(all)
- Computer Science Applications
- Artificial Intelligence