Abstract
Two-sample testing is a fundamental problem in statistics. Despite its long history, there has been renewed interest in this problem with the advent of high-dimensional and complex data. Specifically, in the machine learning literature, there have been recent methodological developments such as classification accuracy tests. The goal of this work is to present a regression approach to comparing multivariate distributions of complex data. Depending on the chosen regression model, our framework can efficiently handle different types of variables and various structures in the data, with competitive power under many practical scenarios. Whereas previous work has been largely limited to global tests which conceal much of the local information, our approach naturally leads to a local two-sample testing framework in which we identify local differences between multivariate distributions with statistical confidence. We demonstrate the efficacy of our approach both theoretically and empirically, under some well-known parametric and nonparametric regression methods. Our proposed methods are applied to simulated data as well as a challenging astronomy data set to assess their practical usefulness.
Original language | English |
---|---|
Pages (from-to) | 5253-5305 |
Number of pages | 53 |
Journal | Electronic Journal of Statistics |
Volume | 13 |
Issue number | 2 |
DOIs | |
Publication status | Published - 2019 |
Bibliographical note
Funding Information:ABL would like to thank Rafael Izbicki and Larry Wasserman for discussions that lead to the two-sample testing work, and Peter Freeman and Jeffrey Newman for acting as IK’s co-advisors for the data analysis project on which Section 6 is based. The authors also thank the editor and the reviewers for their constructive comments and suggestions. This work was partially supported by NSF DMS-1520786.
Funding Information:
ABL would like to thank Rafael Izbicki and Larry Wasserman for discussions that lead to the two-sample testing work, and Peter Freeman and Jeffrey Newman for acting as IK?s co-advisors for the data analysis project on which Section 6 is based. The authors also thank the editor and the reviewers for their constructive comments and suggestions. This work was partially supported by NSF DMS-1520786.
Publisher Copyright:
© 2019, Institute of Mathematical Statistics. All rights reserved.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty