开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

李灰灰 · 2022年09月14日

TF-IDF方法,不是越高越好么?

* 问题详情,请 查看题干

NO.PZ202108310100000101

问题如下:

Based on the text exploration method used for Dataset ABC, tokens that potentially carry important information useful for differentiating the sentiment embedded in the text are most likely to have values that are:

选项:

A.

low

B.

intermediate

C.

high

解释:

B is correct.

When analyzing term frequency at the corpus level, also known as collection frequency, tokens with intermediate term frequency (TF) values potentially carry important information useful for differentiating the sentiment embedded in the text.

A is incorrect because tokens with the lowest TF values are mostly proper nouns or sparse terms (noisy terms) that are not important to the meaning of the text.

C is incorrect because tokens with the highest TF values are mostly stop words (noisy terms) that do not contribute to differentiating the sentiment embedded in the text.

Bector conducts a collection frequency analysis,文章里写的是用TF-IDF方法,不是越高越好么?

1 个答案
已采纳答案

星星_品职助教 · 2022年09月14日

同学你好,

TF-IDF是后面题目的条件,针对的是之后的3个statement。本题没有涉及到这种方法。

本题对应的题干是前面的collection frequency analysis,所以用的分析方式是TF (Collection Level) 。

由于过高的TF是stop words,过低的TF是专有名词等罕见词汇,所以选择intermediate的TF。