NO.PZ201512020300000607
问题如下:
Is Steele’s statement regarding Step 1 of the preprocessing of raw text data correct?
选项:
A.Yes.
No, because her suggested treatment of punctuation is incorrect.
No, because her suggested treatment of extra white spaces is incorrect
解释:
B is correct. Although most punctuations are not necessary for text analysis and should be removed, some punctuations (e.g., percentage signs, currency symbols, and question marks) may be useful for ML model training. Such punctuations should be substituted with annotations (e.g., /percentSign/, /dollarSign/, and /questionMark/) to preserve their grammatical meaning in the text. Such annotations preserve the semantic meaning of important characters in the text for further text processing and analysis stages.
如题