Impacts of Fractional Hot-Deck Imputation on Learning and Prediction of Engineering Data

Ikkyun Song, Yicheng Yang, Jongho Im, Tong Tong, Halil Ceylan, In Ho Cho

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

In broad engineering fields, missing data is a common issue which often causes undesired bias and sparseness impeding rigorous data analyses. To tackle this problem, many imputation theories have been proposed and widely used. However, prior methods often require distributional assumptions and prior knowledge regarding data which may cause some difficulty for engineering research. Essentially, the fractional hot-deck imputation (FHDI) is an assumption-free imputation method, holding broad applicability in the engineering domains. FHDIs internal parameters and impact on statistical and machine learning methods, however, have been rarely understood. Thus, this study investigates the behavior and impacts of FHDI on prediction methods including generalized additive model, support vector machine, extremely randomized trees, and artificial neural network, for which four practical datasets (appliance energy, air quality, phenotypes, and weather) are used. Results show that FHDI performs better for improving the prediction accuracy compared to a simple naive method which cures missing data using the mean value of attributes, and FHDI has an asymptotically positive effect on prediction accuracy with decreasing response rates. Regarding an optimal setting, 30 to 35 is recommended for the FHDIs internal categorization number while 5 is recommended for the FHDI donors, which is aligned with Rubins recommendation.

Original languageEnglish
Article number8735753
Pages (from-to)2363-2373
Number of pages11
JournalIEEE Transactions on Knowledge and Data Engineering
Volume32
Issue number12
DOIs
Publication statusPublished - 2020 Dec 1

Bibliographical note

Publisher Copyright:
© 1989-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Impacts of Fractional Hot-Deck Imputation on Learning and Prediction of Engineering Data'. Together they form a unique fingerprint.

Cite this