开发者:上海品职教育科技有限公司 隐私政策详情

应用版本:4.2.11(IOS)|3.2.5(安卓)APP下载

yuqijeffery · 2022年01月29日

请问为什么在这里group1是high frequency如果range是从0到1的话,是默认没有其他的group吗?

NO.PZ2021083101000014

问题如下:

As an additional part of the text exploration step, Achler conducts a term frequency analysis to identify outliers. Achler summarizes the analysis in Exhibit 2.

Based on Exhibit 2, Achler should exclude from further analysis words in:

选项:

A.

only Group 1

B.

only Group 2

C.

both Group 1 and Group 2

解释:

C is correct.

Achler should remove words that are in both Group 1 and Group 2. Term frequency values range between 0 and 1. Group 1 consists of the highest frequency values (e.g., “the” = 0.04935), and Group 2 consists of the lowest frequency values (e.g., “naval” = 1.0123e–05).

Frequency analysis on the processed text data helps in filtering unnecessary tokens (or features) by quantifying how important tokens are in a sentence and in the corpus as a whole.

The most frequent tokens (Group 1) strain the machine-learning model to choose a decision boundary among the texts as the terms are present across all the texts, which leads to model underfitting.

The least frequent tokens (Group 2) mislead the machine-learning model into classifying texts containing the rare terms into a specific class, which leads to model overfitting. Identifying and removing noise features is critical for text classification applications.

A is incorrect because words in both Group 1 and Group 2 should be removed.

The words with high term frequency value are mostly stop words, present in most sentences. Stop words do not carry a semantic meaning for the purpose of text analyses and ML training, so they do not contribute to differentiating sentiment.

B is incorrect because words in both Group 1 and Group 2 should be removed.

Terms with low term frequency value are mostly rare terms, ones appearing only once or twice in the data. They do not contribute to differentiating sentiment.

考点:Unstructured Data Exploration

请问为什么在这里group1是high frequency如果range是从0到1的话,是默认没有其他的group吗?

1 个答案

星星_品职助教 · 2022年01月30日

同学你好,

group 1从这个角度判断不大直观。其实看到最左一列的那些个the、and、to就可以直接排除了,这是典型的“stop word”,没有实质意义,不会体现出有效信息。


  • 1

    回答
  • 0

    关注
  • 430

    浏览
相关问题

NO.PZ2021083101000014问题如下 aitionpart of the text exploration step, Achler concts a term frequenanalysis to intify outliers. Achler summarizes the analysis in Exhibit 2.Baseon Exhibit 2, Achler shoulexclu from further analysis wor in: A.only Group 1B.only Group 2C.both Group 1 anGroup 2 C is correct. Achler shoulremove wor thare in both Group 1 anGroup 2. Term frequenvalues range between 0 an1. Group 1 consists of the highest frequenvalues (e.g., “the” = 0.04935), anGroup 2 consists of the lowest frequenvalues (e.g., “naval” = 1.0123e–05). Frequenanalysis on the processetext ta helps in filtering unnecessary tokens (or features) quantifying how important tokens are in a sentenanin the corpus a whole. The most frequent tokens (Group 1) strain the machine-learning mol to choose a cision bounry among the texts the terms are present across all the texts, whilea to mol unrfitting. The least frequent tokens (Group 2) mislethe machine-learning mol into classifying texts containing the rare terms into a specific class, whilea to mol overfitting. Intifying anremoving noise features is criticfor text classification applications.A is incorrebecause wor in both Group 1 anGroup 2 shoulremove The wor with high term frequenvalue are mostly stop wor, present in most sentences. Stop wor not carry a semantic meaning for the purpose of text analyses anML training, so they not contribute to fferentiating sentiment.B is incorrebecause wor in both Group 1 anGroup 2 shoulremove Terms with low term frequenvalue are mostly rare terms, ones appearing only onor twiin the tThey not contribute to fferentiating sentiment.考点Unstructureta Exploration 什么频率算作intermeate

2024-08-21 23:36 1 · 回答

NO.PZ2021083101000014问题如下 aitionpart of the text exploration step, Achler concts a term frequenanalysis to intify outliers. Achler summarizes the analysis in Exhibit 2.Baseon Exhibit 2, Achler shoulexclu from further analysis wor in: A.only Group 1B.only Group 2C.both Group 1 anGroup 2 C is correct. Achler shoulremove wor thare in both Group 1 anGroup 2. Term frequenvalues range between 0 an1. Group 1 consists of the highest frequenvalues (e.g., “the” = 0.04935), anGroup 2 consists of the lowest frequenvalues (e.g., “naval” = 1.0123e–05). Frequenanalysis on the processetext ta helps in filtering unnecessary tokens (or features) quantifying how important tokens are in a sentenanin the corpus a whole. The most frequent tokens (Group 1) strain the machine-learning mol to choose a cision bounry among the texts the terms are present across all the texts, whilea to mol unrfitting. The least frequent tokens (Group 2) mislethe machine-learning mol into classifying texts containing the rare terms into a specific class, whilea to mol overfitting. Intifying anremoving noise features is criticfor text classification applications.A is incorrebecause wor in both Group 1 anGroup 2 shoulremove The wor with high term frequenvalue are mostly stop wor, present in most sentences. Stop wor not carry a semantic meaning for the purpose of text analyses anML training, so they not contribute to fferentiating sentiment.B is incorrebecause wor in both Group 1 anGroup 2 shoulremove Terms with low term frequenvalue are mostly rare terms, ones appearing only onor twiin the tThey not contribute to fferentiating sentiment.考点Unstructureta Exploration 老师,这道题的意思是需要移除Group1的全部stop wor及Group2里词频最低的naval吗?

2023-03-15 17:47 1 · 回答

NO.PZ2021083101000014问题如下 aitionpart of the text exploration step, Achler concts a term frequenanalysis to intify outliers. Achler summarizes the analysis in Exhibit 2.Baseon Exhibit 2, Achler shoulexclu from further analysis wor in: A.only Group 1B.only Group 2C.both Group 1 anGroup 2 C is correct. Achler shoulremove wor thare in both Group 1 anGroup 2. Term frequenvalues range between 0 an1. Group 1 consists of the highest frequenvalues (e.g., “the” = 0.04935), anGroup 2 consists of the lowest frequenvalues (e.g., “naval” = 1.0123e–05). Frequenanalysis on the processetext ta helps in filtering unnecessary tokens (or features) quantifying how important tokens are in a sentenanin the corpus a whole. The most frequent tokens (Group 1) strain the machine-learning mol to choose a cision bounry among the texts the terms are present across all the texts, whilea to mol unrfitting. The least frequent tokens (Group 2) mislethe machine-learning mol into classifying texts containing the rare terms into a specific class, whilea to mol overfitting. Intifying anremoving noise features is criticfor text classification applications.A is incorrebecause wor in both Group 1 anGroup 2 shoulremove The wor with high term frequenvalue are mostly stop wor, present in most sentences. Stop wor not carry a semantic meaning for the purpose of text analyses anML training, so they not contribute to fferentiating sentiment.B is incorrebecause wor in both Group 1 anGroup 2 shoulremove Terms with low term frequenvalue are mostly rare terms, ones appearing only onor twiin the tThey not contribute to fferentiating sentiment.考点Unstructureta Exploration frequency不是0—1吗?!Group 1这些词的频率是0.0几,不算高吧?

2022-07-26 10:42 1 · 回答

NO.PZ2021083101000014问题如下 aitionpart of the text exploration step, Achler concts a term frequenanalysis to intify outliers. Achler summarizes the analysis in Exhibit 2.Baseon Exhibit 2, Achler shoulexclu from further analysis wor in: A.only Group 1 B.only Group 2 C.both Group 1 anGroup 2 C is correct. Achler shoulremove wor thare in both Group 1 anGroup 2. Term frequenvalues range between 0 an1. Group 1 consists of the highest frequenvalues (e.g., “the” = 0.04935), anGroup 2 consists of the lowest frequenvalues (e.g., “naval” = 1.0123e–05). Frequenanalysis on the processetext ta helps in filtering unnecessary tokens (or features) quantifying how important tokens are in a sentenanin the corpus a whole. The most frequent tokens (Group 1) strain the machine-learning mol to choose a cision bounry among the texts the terms are present across all the texts, whilea to mol unrfitting. The least frequent tokens (Group 2) mislethe machine-learning mol into classifying texts containing the rare terms into a specific class, whilea to mol overfitting. Intifying anremoving noise features is criticfor text classification applications.A is incorrebecause wor in both Group 1 anGroup 2 shoulremove The wor with high term frequenvalue are mostly stop wor, present in most sentences. Stop wor not carry a semantic meaning for the purpose of text analyses anML training, so they not contribute to fferentiating sentiment.B is incorrebecause wor in both Group 1 anGroup 2 shoulremove Terms with low term frequenvalue are mostly rare terms, ones appearing only onor twiin the tThey not contribute to fferentiating sentiment.考点Unstructureta Exploration 请问老师移除group1只是因为这些都是stop wor如果换成其他词,要怎么判断是否需要移除?

2022-04-07 19:03 1 · 回答

NO.PZ2021083101000014 group 2 被移除是因为频率太低吗?

2022-02-04 17:23 1 · 回答