NO.PZ202208300200000303
问题如下:
Jain VignetteAmandeep Jain is a credit analyst at the Entrepreneurship Business Development Corporation (EBDC), a government-backed entity that specializes in providing loans and consulting services to growing small and medium-sized enterprises.
Jain sits down with her colleague, Peiran Zhang, a recently hired data scientist. They have been asked to work together to develop a machine learning model to improve EBDC’s predictive ability of identifying potential defaults. The current model is a k-nearest neighbor (k-NN) model, which is used to identify similarities in default companies and was originally created when the entire portfolio of EBDC loans was about 60% of its current level.
Jain suggests to Zhang that the following small changes to the current model could increase its overall predictive ability:
automating feature selection to improve model performance,
adjusting hyperparameter k on the basis of the increased portfolio size, and
adding additional non-financial metrics to identify new relationships.
Jain tells Zhang that the “goal” has been and continues to be predicting what is stored in the system in a field called Default_Status, which records whether a loan is either
Zhang develops an initial prototype and shares with Jian the results based on a subset of the portfolio that was segmented for the purpose of training the model. Exhibit 1 compares predicted and actual defaults from the model.
Exhibit 1:
Predicted vs. Actual Default
选项:
A.the current model. B.a random forest model. C.a support vector model.解释:
SolutionB is correct. Since the defined output, Probability of Default, will be a continuous variable and there is a target variable, Default_Status, then the required model would require the use of regression to solve. Random forest algorithms are a form of continuous supervised models with a target variable.
A is incorrect. The current model is a k-nearest neighbor model, which is a form of supervised classification model for categorical, not continuous, values when the target variable, Default_Status, is defined.
C is incorrect. A support vector model is incorrect because it is a form of supervised classification model used with categorical, not continuous, target variables.
这个是怎么看出是连续结果的 不是分类吗? 违约或者不违约?