开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

FrankSun · 2021年10月05日

为啥选B?可以解释一下吗?谢谢

NO.PZ2021083101000003

问题如下:

Iesha Azarov is a senior analyst at Ganymede Moon Partners (Ganymede), where he works with junior analyst Pàola Bector. Azarov would like to incorporate machine learning (ML) models into the company’s analytical process.

Azarov asks Bector to develop ML models for two unstructured stock sentiment datasets, Dataset ABC and Dataset XYZ. Both datasets have been cleaned and preprocessed in preparation for text exploration and model training.

Following an exploratory data analysis that revealed Dataset ABC’s most frequent tokens, Bector conducts a collection frequency analysis.

Based on the text exploration method used for Dataset ABC, tokens that potentially carry important information useful for differentiating the sentiment embedded in the text are most likely to have values that are:

选项:

A.

low

B.

intermediate

C.

high

解释:

B is correct.

When analyzing term frequency at the corpus level, also known as collection frequency, tokens with intermediate term frequency (TF) values potentially carry important information useful for differentiating the sentiment embedded in the text.

A is incorrect because tokens with the lowest TF values are mostly proper nouns or sparse terms (noisy terms) that are not important to the meaning of the text.

C is incorrect because tokens with the highest TF values are mostly stop words (noisy terms) that do not contribute to differentiating the sentiment embedded in the text.

为啥选B?可以解释一下吗?谢谢

1 个答案

星星_品职助教 · 2021年10月06日

同学你好,

答案解析中已经解释了:

1)高TF值是stop words,所以没有区分度;

2)低TF值的token是不重要的词,也没有区分度;

3)只有排除高和低之后的中间TF值才是有区分度的值。

-----------

此后提问需要标明具体的问题点。