Random forests and adaptive nearest neighbors

Yi Lin, Yongho Jeon

Research output: Contribution to journalArticlepeer-review

302 Citations (Scopus)

Abstract

In this article we study random forests through their connection with a new framework of adaptive nearest-neighbor methods. We introduce a concept of potential nearest neighbors (k-PNNs) and show that random forests can be viewed as adaptively weighted K-PNN methods. Various aspects of random forests can be studied from this perspective. We study the effect of terminal node sizes on the prediction accuracy of random forests. We further show that random forests with adaptive splitting schemes assign weights to k-PNNs in a desirable way: for the estimation at a given target point, these random forests assign voting weights to the k-PNNs of the target point according to the local importance of different input variables. We propose a new simple splitting scheme that achieves desirable adaptivity in a straightforward fashion. This simple scheme can be combined with existing algorithms. The resulting algorithm is computationally faster and gives comparable results. Other possible aspects of random forests, such as using linear combinations in splitting, are also discussed. Simulations and real datasets are used to illustrate the results.

Original languageEnglish
Pages (from-to)578-590
Number of pages13
JournalJournal of the American Statistical Association
Volume101
Issue number474
DOIs
Publication statusPublished - 2006 Jun

Bibliographical note

Funding Information:
Yi Lin is Associate Professor (E-mail: yilin@stat.wisc.edu) and Yongho Jeon is a Graduate Student (E-mail: yjeon@stat.wisc.edu), Department of Statistics, University of Wisconsin, Madison, WI 53706. This work was supported in part by National Science Foundation grant DMS-01-34987. The authors thank Leo Breiman for helpful comments and suggestions.

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Random forests and adaptive nearest neighbors'. Together they form a unique fingerprint.

Cite this