开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

Celestine · 2020年01月29日

问一道题:NO.PZ2015120204000031

问题如下:

Paul suggests the following step which would be repeated every quarter.

Step 2 We utilize ML techniques to divide our investable universe of about 10,000 stocks into 20 different groups, based on a wide variety of the most relevant financial and non-financial characteristics. The idea is to prevent unintended portfolio concentration by selecting stocks from each of these distinct groups.

Which of the following machine learning techniques is most appropriate for executing Step 2:

选项:

A.

K-Means Clustering

B.

Principal Components Analysis (PCA)

C.

Classification and Regression Trees (CART)

解释:

A is correct. K-Means clustering is an unsupervised machine learning algorithm which repeatedly partitions observations into a fixed number, k, of nonoverlapping clusters (i.e., groups).

B is incorrect. Principal Components Analysis is a long-established statistical method for dimension reduction, not clustering. PCA aims to summarize or reduce highly correlated features of data into a few main, uncorrelated composite variables.

C is incorrect. CART is a supervised machine learning technique that is most commonly applied to binary classification or regression.

老师您好,supervised ML中的classification和unsupervised ML中的clustering总感觉分不太清,怎样判断到底是不是对数据贴了标签呢?

1 个答案
已采纳答案

星星_品职助教 · 2020年02月03日

同学你好,

label指的是明确的区分谁是X,谁是Y。关键词是Y,或target variable之类的。

再就是看算法特征,类似PCA,clustering这种算法,肯定是unsupervised的,unsupervised等同于没有traget variable / 没有label。

其实大部分算法的目的都是分类,但是各自有特点,例如CART大部分都是离散的二叉树分类,一步一步往下走。所以这道题目没有这个特点就不能选。