开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

YI YU · 2024年11月04日

Entropy

NO.PZ2024030508000093

问题如下:

A quantitative analyst supporting the acquisitions team of a European corporate real estate firm is using the decision tree technique to create a model for forecasting property prices. The analyst compiles a training data set comprised of information from 10 recent property sales, as shown in the following table:

The table also includes the target variable of the model: a class label indicating whether the property was sold for a price greater than EUR 8,000,000. The analyst selects the occupancy status as the feature that is used as the root node of the decision tree. What is the estimated information gain of the split put forward by this root node?

选项:

A.0.09 B.0.37 C.0.44 D.0.82

解释:

Explanation: A is correct. Before we can calculate the information gain as Ginibase Giniweighted, we first calculate for the base-level Gini measure by looking at the output variable being considered before we know anything about the features.

There are 5 properties that sold above EUR 8,000,000 and 5 that sold below.

Ginibase =

Using the feature “occupancy status” as the root node, we examine this feature and find that for the 4 properties that were occupied, 3 sold above the amount and only 1 sold below.

Ginioccupied =

In a similar fashion, we find that for the 6 properties that were not occupied, 2 sold above the amount and 4 sold below.

Gininotoccupied =

Thus, the weighted Gini measure for this feature is obtained as:

Giniweighted =

Therefore, Information Gain = Ginibase Giniweighted = 0.50-0.4097 = 0.0902 or approximately 0.09.

B is incorrect. This is just the Gini measure for the sold properties that were occupied.

C is incorrect. This is just the Gini measure for the sold properties that were not occupied.

D is incorrect. This is the unweighted sum of the Gini measure for the sold properties that were occupied and the Gini measure for the sold properties that weren’t occupied (0.375 + 0.444).

Learning Objective: Show how a decision tree is constructed and interpreted.

Reference: Global Association of Risk Professionals. Quantitative Analysis. New York, NY: Pearson, 2023, Chapter 15, Machine Learning and Prediction [QA-15].

请问可以讲解一下如果用Entropy这道题应该怎么算吗

2 个答案

李坏_品职助教 · 2024年11月05日

嗨,努力学习的PZer你好:


可以这样理解。

----------------------------------------------
就算太阳没有迎着我们而来,我们正在朝着它而去,加油!

李坏_品职助教 · 2024年11月04日

嗨,从没放弃的小努力你好:



base entropy = -(0.5 * log2(0.5) + 0.5*log2(0.5)) = 1


由于题目选择了occupancy status作为根节点,所以现在计算occupancy status的entropy:

对于occupancy status这一列为Y的,一共有四个样本,其中最后一列有三个是Y,1个是N:

entropy1 = - (3/4 * log2(3/4) + 1/4 * log2(1/4)) = 0.811


而对于occupancy status为N的,一共有6个样本,其中有2个样本最后一列是Y,4个是N:

entropy2 = - (2/6 * log2(2/6) + 4/6 * log2(4/6)) = 0.918


weighted entropy = 0.811 * 4/10 + 0.918 * 6/10 = 0.8752


最后information gain = 1-0.8752=0.1248.


entropy算出来的结果与Gini算出来的不一致,这是因为算法不一样。建议优先用Gini计算,比较简单。

----------------------------------------------
就算太阳没有迎着我们而来,我们正在朝着它而去,加油!

YI YU · 2024年11月05日

可以这么理解啊,如果题目没有明说用什么方法就用Gini

  • 2

    回答
  • 0

    关注
  • 42

    浏览
相关问题

NO.PZ2024030508000093 问题如下 A quantitative analyst supporting theacquisitions teof a Europecorporate reestate firm is using thecision tree technique to create a mol for forecasting property prices. Theanalyst compiles a training ta set compriseof information from 10 recentproperty sales, shown in the following table:The table also inclus thetarget variable of the mol: a class label incating whether the property wassolfor a prigreater thEUR 8,000,000. The analyst selects the occupancystatus the feature this usethe root no of the cision tree. Whatis the estimateinformation gain of the split put forwarthis root no? A.0.09 B.0.37 C.0.44 0.82 Explanation: Ais correct. Before we ccalculate the information gain Ginibase − Giniweighte we first calculate for the base-level Ginimeasure looking the output variable being consirebefore we knowanything about the features.There are 5 properties thsolabove EUR8,000,000 an5 thsolbelow.Ginibase =Using the feature “occupanstatus” theroot no, we examine this feature anfinthfor the 4 properties thwereoccupie 3 solabove the amount anonly 1 solbelow.Ginioccupie= ​In a similfashion, we finthfor the6 properties thwere not occupie 2 solabove the amount an4 solbelow.Gininotoccupie= ​Thus, the weighteGini measure for thisfeature is obtaineas:Giniweighte= ​Therefore, Information Gain = Ginibase − Giniweighte= 0.50-0.4097 = 0.0902 orapproximately 0.09.B is incorrect. This is just the Ginimeasure for the solproperties thwere occupieC is incorrect. This is just the Ginimeasure for the solproperties thwere not occupieis incorrect. This is the unweightesumof the Gini measure for the solproperties thwere occupieanthe Ginimeasure for the solproperties thweren’t occupie(0.375 + 0.444).Learning Objective: Show how a cision tree is constructeaninterpreteReference: GlobalAssociation of Risk Professionals. Quantitative Analysis. New York, NY:Pearson, 2023, Chapter 15, Machine Learning anPrection [QA-15]. 还是不太明白为什么weight要用5/10讲义里面的例题权重是按照feature的个数来做的讲义485页,当我们weight large cap时候使用 large cap/tot和 非large cap/tot并不使用paivintot和no vintotal那为什么这道题不是用同一个思路呢?

2024-10-15 03:20 1 · 回答

NO.PZ2024030508000093 问题如下 A quantitative analyst supporting theacquisitions teof a Europecorporate reestate firm is using thecision tree technique to create a mol for forecasting property prices. Theanalyst compiles a training ta set compriseof information from 10 recentproperty sales, shown in the following table:The table also inclus thetarget variable of the mol: a class label incating whether the property wassolfor a prigreater thEUR 8,000,000. The analyst selects the occupancystatus the feature this usethe root no of the cision tree. Whatis the estimateinformation gain of the split put forwarthis root no? A.0.09 B.0.37 C.0.44 0.82 Explanation: Ais correct. Before we ccalculate the information gain Ginibase − Giniweighte we first calculate for the base-level Ginimeasure looking the output variable being consirebefore we knowanything about the features.There are 5 properties thsolabove EUR8,000,000 an5 thsolbelow.Ginibase =Using the feature “occupanstatus” theroot no, we examine this feature anfinthfor the 4 properties thwereoccupie 3 solabove the amount anonly 1 solbelow.Ginioccupie= ​In a similfashion, we finthfor the6 properties thwere not occupie 2 solabove the amount an4 solbelow.Gininotoccupie= ​Thus, the weighteGini measure for thisfeature is obtaineas:Giniweighte= ​Therefore, Information Gain = Ginibase − Giniweighte= 0.50-0.4097 = 0.0902 orapproximately 0.09.B is incorrect. This is just the Ginimeasure for the solproperties thwere occupieC is incorrect. This is just the Ginimeasure for the solproperties thwere not occupieis incorrect. This is the unweightesumof the Gini measure for the solproperties thwere occupieanthe Ginimeasure for the solproperties thweren’t occupie(0.375 + 0.444).Learning Objective: Show how a cision tree is constructeaninterpreteReference: GlobalAssociation of Risk Professionals. Quantitative Analysis. New York, NY:Pearson, 2023, Chapter 15, Machine Learning anPrection [QA-15]. 如题

2024-05-10 11:07 1 · 回答