Optimal feature set size in random forest regression

Sunwoo Han, Hyunjoong Kim

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)

Abstract

One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the randomForest R package and using the common grid search method.

Original languageEnglish
Article number3428
JournalApplied Sciences (Switzerland)
Volume11
Issue number8
DOIs
Publication statusPublished - 2021 Apr 2

Bibliographical note

Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

All Science Journal Classification (ASJC) codes

  • General Materials Science
  • Instrumentation
  • General Engineering
  • Process Chemistry and Technology
  • Computer Science Applications
  • Fluid Flow and Transfer Processes

Fingerprint

Dive into the research topics of 'Optimal feature set size in random forest regression'. Together they form a unique fingerprint.

Cite this