NO.PZ2021061603000047
问题如下:
An economist collected the monthly returns
for KDL's portfolio and a diversified stock index. The data collected are shown
in the following table:
The economist calculated the correlation
between the two returns and found it to be 0.996. The regression results with
the KDL return as the dependent variable and the index return as the
independent variable are given as follows:
When reviewing the results, Andrea Fusilier
suspected that they were unreliable. She found that the returns for Month 2
should have been 7.21% and 6.49%, instead of the large values shown in the
first table. Correcting these values resulted in a revised correlation of 0.824
and the following revised regression results:
Explain how the bad data affected the
results.
选项:
解释:
The Month 2 data point is an outlier, lying
far away from the other data values.
Because this outlier was caused by a data
entry error, correcting the outlier improves the validity and reliability of
the regression. In this case, revised R2 is lower (from 0.9921 to 0.6784). The
outliers created the illusion of a better fit from the higher R2; the outliers
altered the estimate of the slope. The standard error of the estimate is lower
when the data error is corrected (from 2.861 to 2.0624), as a result of the
lower mean square error. However, at a 0.05 level of significance, both models
fit well. The difference in the fit is illustrated in Exhibit 1:
The outliers created the illusion of a better fit from the higher R2; the outliers altered the estimate of the slope.
这个结论没有普适性对吧?还是得具体情况具体分析?