开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

Yan · 2021年03月14日

这题的意思不是说分类了,然后从分类后的结果选股么?

NO.PZ2015120204000031

问题如下:

Paul suggests the following step which would be repeated every quarter.

Step 2 We utilize ML techniques to divide our investable universe of about 10,000 stocks into 20 different groups, based on a wide variety of the most relevant financial and non-financial characteristics. The idea is to prevent unintended portfolio concentration by selecting stocks from each of these distinct groups.

Which of the following machine learning techniques is most appropriate for executing Step 2:

选项:

A.

K-Means Clustering

B.

Principal Components Analysis (PCA)

C.

Classification and Regression Trees (CART)

解释:

A is correct. K-Means clustering is an unsupervised machine learning algorithm which repeatedly partitions observations into a fixed number, k, of nonoverlapping clusters (i.e., groups).

B is incorrect. Principal Components Analysis is a long-established statistical method for dimension reduction, not clustering. PCA aims to summarize or reduce highly correlated features of data into a few main, uncorrelated composite variables.

C is incorrect. CART is a supervised machine learning technique that is most commonly applied to binary classification or regression.

这道题的目的不是想要把股票分为几大类,然后这几大类都是不一样的,然后从中选择,构建一个比较分散的组合么?如此的话,CART是适合的呀~

1 个答案

星星_品职助教 · 2021年03月14日

同学你好,

①首先看一下CART的定义:

Classification and Regression Tree: A supervised machine learning technique that can be applied to predict either a categorical target variable, producing a classification tree, or a continuous target variable, producing a regression tree(以上都是定义,和这道题关系不大...). CART is commonly applied to binary classification or regression.

所以可以看出,CART一般适合的算法是“二分类( binary )”。例如给出一只股票,根据筛选条件最后得出的结果是投,或者不投(结果二选一)。

而这道题的目的是将10,000只股票分成20类。并不是将一只股票做二分法。

②再看一下K-means clustering的定义:

K-means:A clustering algorithm that repeatedly partitions observations into a fixed number, k, of non-overlapping clusters.

所以可以看出,K-means的方法是适合本题的场景的。其中超参数k=20