开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

66 · 2023年08月30日

Quant

* 问题详情,请 查看题干

NO.PZ202208300200000303

问题如下:

Jain Vignette

Amandeep Jain is a credit analyst at the Entrepreneurship Business Development Corporation (EBDC), a government-backed entity that specializes in providing loans and consulting services to growing small and medium-sized enterprises.

Jain sits down with her colleague, Peiran Zhang, a recently hired data scientist. They have been asked to work together to develop a machine learning model to improve EBDC’s predictive ability of identifying potential defaults. The current model is a k-nearest neighbor (k-NN) model, which is used to identify similarities in default companies and was originally created when the entire portfolio of EBDC loans was about 60% of its current level.

Jain suggests to Zhang that the following small changes to the current model could increase its overall predictive ability:

  • automating feature selection to improve model performance,

  • adjusting hyperparameter k on the basis of the increased portfolio size, and

  • adding additional non-financial metrics to identify new relationships.

Jain tells Zhang that the “goal” has been and continues to be predicting what is stored in the system in a field called Default_Status, which records whether a loan is either or . Jain indicates that eventually the model should generate a “Probability of Default” between 0% and 100% as the final output for each client in the firm’s portfolio.

Zhang develops an initial prototype and shares with Jian the results based on a subset of the portfolio that was segmented for the purpose of training the model. Exhibit 1 compares predicted and actual defaults from the model.

Exhibit 1:

Predicted vs. Actual Default

Exhibit 1:
Question
Given the eventual predictive goal of the model, the best model is:

选项:

A.the current model. B.a random forest model. C.a support vector model.

解释:

Solution

B is correct. Since the defined output, Probability of Default, will be a continuous variable and there is a target variable, Default_Status, then the required model would require the use of regression to solve. Random forest algorithms are a form of continuous supervised models with a target variable.

A is incorrect. The current model is a k-nearest neighbor model, which is a form of supervised classification model for categorical, not continuous, values when the target variable, Default_Status, is defined.

C is incorrect. A support vector model is incorrect because it is a form of supervised classification model used with categorical, not continuous, target variables.

这个是怎么看出是连续结果的 不是分类吗? 违约或者不违约?

1 个答案

星星_品职助教 · 2023年08月30日

同学你好,

根据本题描述“...eventually the model should generate a “Probability of Default” between 0% and 100% as the final output”。可知,题目问的“the eventual predictive goal of the model”是一个从0%-100%连续的概率值。