Abstract
This paper investigates how accurately the prediction of being an introvert vs. extrovert can be made with less than ten predictors. The study is based on a previous data collection of 7161 respondents of a survey on 91 personality and 3 demographic items.
The results show that it is possible to effectively reduce the size of this measurement instrument from 94 to 10 features with a minimal performance loss (1%) achieving an accuracy of 73.81% on unseen data.
Class imbalance correction methods like SMOTE or ADASYN yielded considerable performance improvement on the cross-validation holdout set but not on unseen data. Although gradient boosting machines and random forests performed similarly on these balanced datasets, they relied on distinctly different feature importance.
The results show that it is possible to effectively reduce the size of this measurement instrument from 94 to 10 features with a minimal performance loss (1%) achieving an accuracy of 73.81% on unseen data.
Class imbalance correction methods like SMOTE or ADASYN yielded considerable performance improvement on the cross-validation holdout set but not on unseen data. Although gradient boosting machines and random forests performed similarly on these balanced datasets, they relied on distinctly different feature importance.
Original language | English |
---|---|
Publication status | Accepted/In press - 2020 |
Event | International Conference on Artificial Intelligence in Information and Communication - Takakura Hotel, Fukuoka, Japan Duration: 2020 Feb 19 → 2020 Feb 21 Conference number: 2 http://icaiic.org/ |
Conference
Conference | International Conference on Artificial Intelligence in Information and Communication |
---|---|
Abbreviated title | ICAIIC |
Country/Territory | Japan |
City | Fukuoka |
Period | 20/2/19 → 20/2/21 |
Internet address |