Quant-有问必答-品职教育专注CFA ESG FRM CPA 考研等财经培训课程

Quant

* 问题详情，请查看题干

NO.PZ202208300200000303

问题如下：

Jain Vignette

Amandeep Jain is a credit analyst at the Entrepreneurship Business Development Corporation (EBDC), a government-backed entity that specializes in providing loans and consulting services to growing small and medium-sized enterprises.

Jain sits down with her colleague, Peiran Zhang, a recently hired data scientist. They have been asked to work together to develop a machine learning model to improve EBDC’s predictive ability of identifying potential defaults. The current model is a k-nearest neighbor (k-NN) model, which is used to identify similarities in default companies and was originally created when the entire portfolio of EBDC loans was about 60% of its current level.

Jain suggests to Zhang that the following small changes to the current model could increase its overall predictive ability:

automating feature selection to improve model performance,
adjusting hyperparameter k on the basis of the increased portfolio size, and
adding additional non-financial metrics to identify new relationships.

Jain tells Zhang that the “goal” has been and continues to be predicting what is stored in the system in a field called Default_Status, which records whether a loan is either or . Jain indicates that eventually the model should generate a “Probability of Default” between 0% and 100% as the final output for each client in the firm’s portfolio.

Zhang develops an initial prototype and shares with Jian the results based on a subset of the portfolio that was segmented for the purpose of training the model. Exhibit 1 compares predicted and actual defaults from the model.

Exhibit 1:

Predicted vs. Actual Default

Question
Given the eventual predictive goal of the model, the best model is:

选项：

A.the current model. B.a random forest model. C.a support vector model.

解释：

Solution

B is correct. Since the defined output, Probability of Default, will be a continuous variable and there is a target variable, Default_Status, then the required model would require the use of regression to solve. Random forest algorithms are a form of continuous supervised models with a target variable.

A is incorrect. The current model is a k-nearest neighbor model, which is a form of supervised classification model for categorical, not continuous, values when the target variable, Default_Status, is defined.

C is incorrect. A support vector model is incorrect because it is a form of supervised classification model used with categorical, not continuous, target variables.

这个是怎么看出是连续结果的不是分类吗？违约或者不违约？

Quant

1 个答案

1

0

419

相关问题