开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

梦梦 · 2024年08月19日

关于entropy的计算

NO.PZ2023091601000110

问题如下:

An insurance company specializing in inexperienced drivers is building a decision-tree model to classify drivers that it has previously insured as to whether they made a claim or not in their first year as policyholders. They have the following data on whether a claim was made (“Claim_made”) and two features (for the label and the features, in each case, “yes” = 1 and “no” = 0): whether the policyholder is a car owner and on whether they have a college degree:


a. Calculate the “base entropy” of the Claim_made series.

b. Build a decision tree for this problem.

解释:

a. The base entropy is the entropy of the output series before any splitting. There are four policyholders who made claims and six who did not. The base entropy is therefore:


b. Both of the features are binary, so there are no issues with having to determine a threshold as there would be for a continuous series. The first stage is to calculate the entropy if the split was made for each of the two features.

Examining the Car_owner feature first, among owners (feature = 1), two made a claim while four did not, leading to entropy for this sub-set of:


Among non-car owners (feature = 0), two made a claim and two did not, leading to an entropy of 1. The weighted entropy for splitting by car ownership is therefore given by


and the information gain is information gain = 0.971 - 0.951 = 0.020

We repeat this process by calculating the entropy that would occur if the split was made via the College_degree feature. If we did so, we would observe that the weighted entropy would be 0.551, with an information gain of 0.420. Therefore, because the entropy is maximized when the sample is first split by College_degree, this becomes the root node of the decision tree.

For policyholders with a college degree (i.e., the feature=1), there is already a pure split as four of them have not made claims while none have made claims (in other words, nobody with college degrees made claims). This means that no further splits are required along this branch. The other branch can be split using the Car_ownership feature, which is the only one remaining.

The tree structure is given below:


老师好,1、entropy有两种算法,这道题为什么不用gini的算法呢?一般什么表述是用log的,什么表述是用gini呢?

2、log2X用计算器怎么计算呢?


2 个答案
已采纳答案

pzqa27 · 2024年08月20日

嗨,爱思考的PZer你好:


可以,但是没必要,因为第一问已经算出熵来了,没必要再花更多的时间去算基尼系数。

----------------------------------------------
努力的时光都是限量版,加油!

梦梦 · 2024年08月22日

好的,谢谢

pzqa27 · 2024年08月19日

嗨,爱思考的PZer你好:


1、entropy有两种算法,这道题为什么不用gini的算法呢?一般什么表述是用log的,什么表述是用gini呢?

您可能对决策树有些误解,information gain有2种算法,一个是entropy,一个是基尼系数,题目指定算熵,那就用下图公式计算即可。


2、log2X用计算器怎么计算呢?

比如log2(0.6) = ln(0.6) / ln2,这个叫做“换底公式”。

所以先算出ln(0.6),按键是:0.6, LN。

再算出ln2,按键是2, LN。

然后相除即可。

----------------------------------------------
加油吧,让我们一起遇见更好的自己!

梦梦 · 2024年08月19日

哦,原来如此,没记准确。那“Build a decision tree for this problem.”这问,计算information gain来决定node,是不是既可以用entropy的算法也可以用gini的算法?

  • 2

    回答
  • 0

    关注
  • 58

    浏览
相关问题

NO.PZ2023091601000110 问题如下 insurancecompany specializing in inexperienceivers is builng a cision-tree molto classify ivers thit hpreviously insureto whether they ma aclaim or not in their first yepolicyholrs. They have the following taon whether a claim wma (“Claim_ma”) antwo features (for the label anhe features, in eacase, “yes” = 1 an“no” = 0): whether the policyholris a cowner anon whether they have a college gree:Calculate the“base entropy” of the Claim_ma series.Builacision tree for this problem. The baseentropy is the entropy of the output series before any splitting. There arefour policyholrs who ma claims ansix who not. The base entropy istherefore: Both of thefeatures are binary, so there are no issues with having to termine athresholthere woulfor a continuous series. The first stage is tocalculate the entropy if the split wma for eaof the two features.Examining theCar_owner feature first, among owners (feature = 1), two ma a claim whilefour not, leang to entropy for this sub-set of: Among non-carowners (feature = 0), two ma a claim antwo not, leang to entropyof 1. The weighteentropy for splitting cownership is therefore given antheinformation gain is information gain = 0.971 - 0.951 = 0.020We repethisprocess calculating the entropy thwouloccur if the split wma viathe College_gree feature. If we so, we woulobserve ththe weightentropy woul0.551, with information gain of 0.420. Therefore, becausethe entropy is maximizewhen the sample is first split College_gree, thisbecomes the root no of the cision tree.For policyholrswith a college gree (i.e., the feature=1), there is alrea a pure split asfour of them have not ma claims while none have ma claims (in other wor,nobo with college grees ma claims). This means thno further splits arerequirealong this branch. The other brancsplit using theCar_ownership feature, whiis the only one remaining. The tree structureis given below: 本道题如何用计算器算出Log2(0.6) 和Log2(0.4)

2024-04-16 08:39 1 · 回答