NO.PZ2020010801000040
问题如下:
A given set of data was divided into three equal parts. Three separate models were developed—each model using two of the three parts for fitting. Errors were calculated for each model. The diagram below shows the residual errors for each model run (the light highlights where the data was used for fitting the model versus the dark indicating the data that was held back):
Using the principles of m-fold cross validation, which model should be selected?
选项:
解释:
The first task is to calculate the squared residuals:
The model selected is the one that has the smallest RSS within the blue out-of-sample boxes—this is M3.
M3的in-sample的error大很多,说明模型过于简单啊
M2 in-sample和out-of-sample的error都比较小,不是更好吗