开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

ZF Everyday · 2022年07月15日

求解大数据问题

NO.PZ2021083101000005

问题如下:

Azarov asks Bector to develop ML models for unstructured stock sentiment datasets, Dataset ABC.

Bector notes that Dataset ABC is characterized by the absence of ground truth.

What percentage of Dataset ABC should be allocated to a training subset?

选项:

A.

0%

B.

20%

C.

60%

解释:

A is correct;

0% of the master dataset of Dataset ABC should be allocated to a training subset. Dataset ABC is characterized by the absence of ground truth (i.e., no known outcome or target variable) and is therefore an unsupervised ML model.

For unsupervised learning models, no splitting of the master dataset is needed, because of the absence of labeled training data.

Supervised ML datasets (with labeled training data) contain ground truth, the known outcome (target variable) of each observation in the dataset.

B is incorrect because 20% is the commonly recommended split for the crossvalidation set and test set in supervised training ML datasets.

C is incorrect because 60% is the commonly recommended split for the training set in supervised training ML datasets.

考点:Model Training - Method Selection

也就是非supervised learning。所以没有training data(0%)?老师,这个知识点在哪里呢?

1 个答案
已采纳答案

星星_品职助教 · 2022年07月16日

同学你好,

本题的考点①: ground truth属于supervised learning,所以题干中的“ the absence of ground truth”就对应unsupervised learning;

本题的考点②:unsupervised learning没有training data