Abstract
We propose new discrimination methods for classification of high dimension, low sample size (HDLSS) data that regularize the degree of data piling. The within-class scatter of the HDLSS data, when projected onto a low-dimensional discriminant subspace, can be selected to be arbitrarily small. Using this fact, we develop two different ways of tuning the amount of within-class scatter, or equivalently, the degree of data piling. In the first approach,we consider a linear path connecting the maximal data piling and the least data piling directions. We also formulate a problem of finding the optimal classifier under a constraint on data piling. The data piling regularization methods are extended to multicategory problems. Simulated and real data examples show competitive performances of the proposed classification methods. Supplementary materials for this article are available online on the journal web site.
Original language | English |
---|---|
Pages (from-to) | 433-451 |
Number of pages | 19 |
Journal | Journal of Computational and Graphical Statistics |
Volume | 22 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2013 |
Bibliographical note
Funding Information:Ahn’s research was partly supported by the NSF grant DMS-0805758 and NIH grant 1R21CA152460-01A1. The authors are grateful to an associate editor and the reviewers for helpful comments.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Discrete Mathematics and Combinatorics
- Statistics, Probability and Uncertainty