TY - JOUR
T1 - Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods
AU - Choi, Seong Gyu
AU - Oh, Minsuk
AU - Park, Dong–Hyuk –H
AU - Lee, Byeongchan
AU - Lee, Yong ho
AU - Jee, Sun Ha
AU - Jeon, Justin Y.
N1 - Publisher Copyright:
© 2023, Springer Nature Limited.
PY - 2023/12
Y1 - 2023/12
N2 - We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014–2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N = 32,827). The KNHANES 2014–2018 data were used as training and internal validation sets and the 2019–2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.
AB - We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014–2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N = 32,827). The KNHANES 2014–2018 data were used as training and internal validation sets and the 2019–2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.
UR - http://www.scopus.com/inward/record.url?scp=85168221517&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85168221517&partnerID=8YFLogxK
U2 - 10.1038/s41598-023-40170-0
DO - 10.1038/s41598-023-40170-0
M3 - Article
C2 - 37567907
AN - SCOPUS:85168221517
SN - 2045-2322
VL - 13
JO - Scientific reports
JF - Scientific reports
IS - 1
M1 - 13101
ER -