开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

manman · 2023年09月27日

怎么理解at the threshold P value 0.84看出过度拟合

NO.PZ2021083101000017

问题如下:

Achler splits the DTM into training, cross-validation, and test datasets. Achler uses a supervised learning approach to train the logistic regression model in predicting sentiment. Applying the receiver operating characteristics (ROC) technique and area under the curve (AUC) metrics, Achler evaluates model performance on both the training and the cross-validation datasets. The trained model performance for three different logistic regressions’ threshold p-values is presented in Exhibit 3.


Rivera suggests adjusting the model’s hyperparameters to improve performance.

Based on Exhibit 3, if Achler wants to improve model performance at the threshold p-value of 0.84, he should:

选项:

A.

tune the model to lower the AUC

B.

adjust model parameters to decrease ROC convexity

C.

apply LASSO regularization to the logistic regression

解释:

C is correct.

At the threshold p-value of 0.84, the AUC is 98.4% for the training dataset and 87.1% for the cross-validation dataset, which suggests that the model is currently overfitted. Least absolute shrinkage and selection operator (LASSO) regularization can be applied to the logistic regression to prevent overfitting of logistic regression models.

A is incorrect because the higher the AUC, the better the model performance.

B is incorrect because the more convex the ROC curve and the higher the AUC, the better the model performance. Adjusting model parameters with the aim of achieving lower ROC convexity would result in worse model performance on the cross-validation dataset.

考点: Model Training: Tuning

怎么理解at the threshold P value 0.84看出过度拟合

1 个答案

星星_品职助教 · 2023年09月27日

同学你好,

training set中的AUC为98.4%,而cross-validation set中的AUC为87.1%。两者差距较大。说明这个模型只在training set中拟合的好,但换到validation set中拟合能力就大幅度下降。这就是典型的overfitting的表现,即只在训练集中完美拟合,换到别的数据集中就不行了。

  • 1

    回答
  • 1

    关注
  • 470

    浏览
相关问题