TY - JOUR
T1 - Predictive data mining for diagnosing periodontal disease
T2 - the Korea National Health and Nutrition Examination Surveys (KNHANES V and VI) from 2010 to 2015
AU - Lee, Jae Hong
AU - Jeong, Seong Nyum
AU - Choi, Seongho
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Objectives: This study aimed to identify patients with the highest risk of periodontal disease (PD), and to provide recommendations for the effective use and application of data mining (DM) techniques when establishing evidence-based dental-care policies for vulnerable groups at a high risk of PD. Methods: This study used the SEMMA (Sample, Explore, Modify, Model, and Assess) methodology to construct DM models based on data acquired from the fifth and sixth Korea National Health and Nutrition Examination Surveys (2000-2015). We analyzed the sociodemographic and comorbidity variables that influence PD by applying the popular DM techniques of decision-tree, neural-network, and regression models, and also attempted to improve the predictive power and reliability by comparing the results obtained by these three models. Results: Our comparisons of the three DM algorithms confirmed that the average squared error, misclassification rate, receiver operating characteristic index, Gini coefficient, and Kolmogorov–Smirnov test results were the most appropriate for the decision-tree model. The analysis of the decision-tree model revealed that age and smoking status exert major effects on the risk of PD, and that stress and education level exert effects in rural areas, whereas education level, sex, hyperlipidemia, and alcohol intake exert effects in urban areas. Conclusions: We demonstrated that the decision-tree model is an effective DM technique for identifying the complex risk factors for PD. These results are expected to be helpful in improving the equality and efficacy of dental-care policies for vulnerable groups at a high risk of PD.
AB - Objectives: This study aimed to identify patients with the highest risk of periodontal disease (PD), and to provide recommendations for the effective use and application of data mining (DM) techniques when establishing evidence-based dental-care policies for vulnerable groups at a high risk of PD. Methods: This study used the SEMMA (Sample, Explore, Modify, Model, and Assess) methodology to construct DM models based on data acquired from the fifth and sixth Korea National Health and Nutrition Examination Surveys (2000-2015). We analyzed the sociodemographic and comorbidity variables that influence PD by applying the popular DM techniques of decision-tree, neural-network, and regression models, and also attempted to improve the predictive power and reliability by comparing the results obtained by these three models. Results: Our comparisons of the three DM algorithms confirmed that the average squared error, misclassification rate, receiver operating characteristic index, Gini coefficient, and Kolmogorov–Smirnov test results were the most appropriate for the decision-tree model. The analysis of the decision-tree model revealed that age and smoking status exert major effects on the risk of PD, and that stress and education level exert effects in rural areas, whereas education level, sex, hyperlipidemia, and alcohol intake exert effects in urban areas. Conclusions: We demonstrated that the decision-tree model is an effective DM technique for identifying the complex risk factors for PD. These results are expected to be helpful in improving the equality and efficacy of dental-care policies for vulnerable groups at a high risk of PD.
UR - http://www.scopus.com/inward/record.url?scp=85057140056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057140056&partnerID=8YFLogxK
U2 - 10.1111/jphd.12293
DO - 10.1111/jphd.12293
M3 - Article
C2 - 30468241
SN - 0022-4006
JO - Journal of Public Health Dentistry
JF - Journal of Public Health Dentistry
ER -