This question already has answers here:
Python remove stop words from pandas dataframe
                                
                                    (3个答案)
                                
                        
                                12个月前关闭。
            
                    
Value counts of words

如何删除“ to”,“ and”,“ from”,“ this”等常用词。我只想保留“ AI”,“数据”,“学习”,“机器”,“人工”等字眼。

最佳答案

我认为您要删除的是停用词,例如“ to”,“ the”等。nltk具有预定义的停用词列表:

from nltk.corpus import stopwords
stop_words = stopwords.words('english')
stop_words

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',...


您可以使用np.where将停用词替换为np.nan

title_analysis['new_col'] = np.where(title_analysis['words'].str.contains(stopwords), np.nan, title_analysis['words'])


然后做value_counts()

title_analysis['new_col'].value_counts()


如果您要忽略自己的一组单词,只需将stop_words替换为单词列表即可。

08-25 00:05