The impact of transforming unstructured data into structured data on a churn prediction model for loan customers

Hoon Jung, Bong Gyou Lee

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

Original languageEnglish
Pages (from-to)4706-4724
Number of pages19
JournalKSII Transactions on Internet and Information Systems
Issue number12
Publication statusPublished - 2020 Dec 31

Bibliographical note

Funding Information:
The research work has been funded by the Natural Science Foundation of China under Grant No. 61333018.

Publisher Copyright:
© 2020 KSII.

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Networks and Communications


Dive into the research topics of 'The impact of transforming unstructured data into structured data on a churn prediction model for loan customers'. Together they form a unique fingerprint.

Cite this