开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

Hugogooo · 2020年06月11日

问一道题:NO.PZ2015120204000047

问题如下:

Steele and Schultz then discuss how to preprocess the raw text data. Steele tells Schultz that the process can be completed in the following three steps:

Step 1 Cleanse the raw text data.

Step 2 Split the cleansed data into a collection of words for them to be normalized.

Step 3 Normalize the collection of words from Step 2 and create a distinct set of tokens from the normalized words.

The output created in Steele’s Step 3 can be best described as a:

选项:

A.

bag-of-words.

B.

set of n-grams.

C.

document term matrix.

解释:

A is correct. After the cleansed text is normalized, a bag-of-words is created. A bag-of-words (BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset.

老师 可以提供下bow的讲义页面么

1 个答案

星星_品职助教 · 2020年06月11日

同学你好,

如图