开发者:上海品职教育科技有限公司隐私政策详情

应用版本:4.2.11(IOS)｜3.2.5(安卓)APP下载

学习体验
App下载
手机上的品职教育

随时随地学习课程，支持音视频下载！
- 扫码下载品职教育APP
进入课程
登录 | 注册

兔小兔 · 2021年11月24日

c

NO.PZ2021083101000012

问题如下：

Achler and Rivera discuss remaining text wrangling tasks—specifically, which tokens to include in the document term matrix (DTM). Achler divides unique tokens into three groups; a sample of each group is shown in Exhibit 1.

Based on Exhibit 1, which token group has most likely undergone the text preparation and wrangling process?

选项：

A.

Token Group 1

B.

Token Group 2

C.

Token Group 3

解释：

A is correct.

Data preparation and wrangling involve cleansing and organizing raw data into a consolidated format.

Token Group 1 includes n-grams (“not_increas_market, ” “sale_decreas”) and the words that have been converted from their inflected forms into their base word (“increas, ” “decreas”), and the currency symbol has been replaced with a “currencysign” token.

N-gram tokens are helpful for keeping negations intact in the text, which is vital for sentiment prediction. The process of converting inflected forms of a word into its base word is called stemming and helps decrease data sparseness, thereby aiding in training less complex ML models.

B is incorrect because Token Group 2 includes inflected forms of words (“increased, ” “decreased”) before conversion into their base words (known as stems).

Stemming (along with lemmatization) decreases data sparseness by aggregating many sparsely occurring words in relatively less sparse stems or lemmas, thereby aiding in training less complex ML models.

C is incorrect because Token Group 3 includes inflected forms of words (“increased, ” “decreased”) before conversion into their base words (known as stems). In addition, the “EUR” currency symbol has not been replaced with the “currencysign” token and the word “Sales” has not been lowercased.

考点：Unstructured Data Wrangling (Preprocessing)

三者如何区别

这题题目的意思是什么

悬着的重点是啥

添加评论

0
0

1 个答案

星星_品职助教 · 2021年11月24日

同学你好，

题干要求选择出已经做了“text preparation and wrangling process”的group。

以对比Group 1和2的区别为例。差距主要在是group1的increas 和decreas。这是做了“text preparation and wrangling process”中的stemming的标志。

stemming会使得increase，increased，increasing这三个本质一样的词，从原本的被识别为三个特征，变成此时（正确的）只会被识别为是一个特征“increas”。对于decreas”也是同理。

B选项（group 2）没有更改“increased, ” “decreased”；

C选项（Group 3）除了没有更改“increased, ” “decreased”以外，EUR也没有被改成currencysign；

添加评论

2
0

1
回答
3
关注
633
浏览

我要回答关注问题

相关问题

NO.PZ2021083101000012 问题如下 Achler anRivera scuss remaining text wrangling tasks—specifically, whitokens to inclu in the cument term matrix (M). Achler vis unique tokens into three groups; a sample of eagroup is shown in Exhibit 1.Baseon Exhibit 1, whitoken group hmost likely unrgone the text preparation anwrangling process? A.Token Group 1 B.Token Group 2 C.Token Group 3 A is correct. ta preparation anwrangling involve cleansing anorganizing rta into a consoliteformat. Token Group 1 inclus n-grams (“not_increas_market, ” “sale_creas”) anthe wor thhave been convertefrom their inflecteforms into their base wor(“increas, ” “creas”), anthe currensymbol hbeen replacewith a “currencysign” token. N-grtokens are helpful for keeping negations intain the text, whiis vitfor sentiment prection. The process of converting inflecteforms of a worinto its base woris callestemming anhelps crease ta sparseness, thereaing in training less complex ML mols.B is incorrebecause Token Group 2 inclus inflecteforms of wor (“increase ” “crease) before conversion into their base wor (known stems). Stemming (along with lemmatization) creases ta sparseness aggregating many sparsely occurring wor in relatively less sparse stems or lemmas, thereaing in training less complex ML mols.C is incorrebecause Token Group 3 inclus inflecteforms of wor (“increase ” “crease) before conversion into their base wor (known stems). In aition, the “EUR” currensymbol hnot been replacewith the “currencysign” token anthe wor“Sales” hnot been lowercase考点Unstructureta Wrangling (Preprocessing)

2023-03-18 20:46 1 · 回答

NO.PZ2021083101000012 请问AB如何区分

2021-10-05 12:34 1 · 回答