TY - JOUR

T1 - The instability of the Pearson correlation coefficient in the presence of coincidental outliers

AU - Kim, Yunmi

AU - Kim, Tae Hwan

AU - Ergün, Tolga

N1 - Publisher Copyright:
© 2015 Elsevier Inc.

PY - 2015/5/1

Y1 - 2015/5/1

N2 - It is well known that any statistic based on sample averages can be sensitive to outliers. Some examples are the conventional moments-based statistics such as the sample mean, the sample variance, or the sample covariance of a set of observations on two variables. Given that sample correlation is defined as sample covariance divided by the product of sample standard deviations, one might suspect that the impact of outliers on the correlation coefficient may be neither present nor noticeable because of a 'dampening effect' i.e., the effects of outliers on both the numerator and the denominator of the correlation coefficient can cancel each other. In this paper, we formally investigate this issue. Contrary to such an expectation, we show analytically and by simulations that the distortion caused by outliers in the behavior of the correlation coefficient can be fairly large in some cases, especially when outliers are present in both variables at the same time. These outliers are called 'coincidental outliers.' We consider some robust alternative measures and compare their performance in the presence of such coincidental outliers.

AB - It is well known that any statistic based on sample averages can be sensitive to outliers. Some examples are the conventional moments-based statistics such as the sample mean, the sample variance, or the sample covariance of a set of observations on two variables. Given that sample correlation is defined as sample covariance divided by the product of sample standard deviations, one might suspect that the impact of outliers on the correlation coefficient may be neither present nor noticeable because of a 'dampening effect' i.e., the effects of outliers on both the numerator and the denominator of the correlation coefficient can cancel each other. In this paper, we formally investigate this issue. Contrary to such an expectation, we show analytically and by simulations that the distortion caused by outliers in the behavior of the correlation coefficient can be fairly large in some cases, especially when outliers are present in both variables at the same time. These outliers are called 'coincidental outliers.' We consider some robust alternative measures and compare their performance in the presence of such coincidental outliers.

UR - http://www.scopus.com/inward/record.url?scp=84928767892&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84928767892&partnerID=8YFLogxK

U2 - 10.1016/j.frl.2014.12.005

DO - 10.1016/j.frl.2014.12.005

M3 - Article

AN - SCOPUS:84928767892

SN - 1544-6123

VL - 13

SP - 243

EP - 257

JO - Finance Research Letters

JF - Finance Research Letters

ER -