TY - JOUR
T1 - The instability of the Pearson correlation coefficient in the presence of coincidental outliers
AU - Kim, Yunmi
AU - Kim, Tae Hwan
AU - Ergün, Tolga
N1 - Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2015/5/1
Y1 - 2015/5/1
N2 - It is well known that any statistic based on sample averages can be sensitive to outliers. Some examples are the conventional moments-based statistics such as the sample mean, the sample variance, or the sample covariance of a set of observations on two variables. Given that sample correlation is defined as sample covariance divided by the product of sample standard deviations, one might suspect that the impact of outliers on the correlation coefficient may be neither present nor noticeable because of a 'dampening effect' i.e., the effects of outliers on both the numerator and the denominator of the correlation coefficient can cancel each other. In this paper, we formally investigate this issue. Contrary to such an expectation, we show analytically and by simulations that the distortion caused by outliers in the behavior of the correlation coefficient can be fairly large in some cases, especially when outliers are present in both variables at the same time. These outliers are called 'coincidental outliers.' We consider some robust alternative measures and compare their performance in the presence of such coincidental outliers.
AB - It is well known that any statistic based on sample averages can be sensitive to outliers. Some examples are the conventional moments-based statistics such as the sample mean, the sample variance, or the sample covariance of a set of observations on two variables. Given that sample correlation is defined as sample covariance divided by the product of sample standard deviations, one might suspect that the impact of outliers on the correlation coefficient may be neither present nor noticeable because of a 'dampening effect' i.e., the effects of outliers on both the numerator and the denominator of the correlation coefficient can cancel each other. In this paper, we formally investigate this issue. Contrary to such an expectation, we show analytically and by simulations that the distortion caused by outliers in the behavior of the correlation coefficient can be fairly large in some cases, especially when outliers are present in both variables at the same time. These outliers are called 'coincidental outliers.' We consider some robust alternative measures and compare their performance in the presence of such coincidental outliers.
UR - http://www.scopus.com/inward/record.url?scp=84928767892&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84928767892&partnerID=8YFLogxK
U2 - 10.1016/j.frl.2014.12.005
DO - 10.1016/j.frl.2014.12.005
M3 - Article
AN - SCOPUS:84928767892
SN - 1544-6123
VL - 13
SP - 243
EP - 257
JO - Finance Research Letters
JF - Finance Research Letters
ER -