开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

Pavel Korchagin · 2023年11月08日

没看懂考点是什么

* 问题详情,请 查看题干

NO.PZ202108310100000103

问题如下:

What percentage of Dataset ABC should be allocated to a training subset?

选项:

A.

0%

B.

20%

C.

60%

解释:

A is correct; 0% of the master dataset of Dataset ABC should be allocated to a training subset. Dataset ABC is characterized by the absence of ground truth (i.e., no known outcome or target variable) and is therefore an unsupervised ML model.

For unsupervised learning models, no splitting of the master dataset is needed, because of the absence of labeled training data.

Supervised ML datasets (with labeled training data) contain ground truth, the known outcome (target variable) of each observation in the dataset.

B is incorrect because 20% is the commonly recommended split for the crossvalidation set and test set in supervised training ML datasets.

C is incorrect because 60% is the commonly recommended split for the training set in supervised training ML datasets.

是在考labeled data是用在supervised,not labeled 是用在unsupervised吗?

没看到这个题目哪里有说是unsupervised?

1 个答案

星星_品职助教 · 2023年11月08日

同学你好,

题干说明“Dataset ABC is characterized by the absence of ground truth”。ground truth意为可以确认真伪的数据或者有标准答案(标签)的数据,从CFA的角度出发,直接简单理解为有标签的数据(labeled data)即可。这是supervised learning的特征。

所以,absence of ground truth就说明没有labeled data,也就是unsupervised learning。Unsupervised learning没有(labeled)training data,即比例是0%。