我想在评论栏中的数据框中找到某些关键术语的价值计数,它是客户评论的数据集,我正在寻找某些单词的价值计数。我想要值计数的单词是这些单词:
keywords= ["big","hat",'dress',"fabric","color"]
在下面,我已经创建了一个函数,该函数将说明每行中是否包含我的关键术语之一。但是,现在我需要找到“关键字”的价值计数,但我有些困惑,有人可以帮忙吗?
如何查找下面列出的关键字的价值计数?
keywords= ["big","hat",'dress',"fabric","color"]
def keyword(value):
strings = value.split()
if any(word in strings for word in keywords):
return 1
else:
return 0
shopbop['keyword_solution']=shopbop['review_mo'].apply(keyword)
这只会使新列中的单词是否在其中。
奖励...如果有一种方法可以使像下面这样的列,但也可以在该新列行中显示新创建的列中该行中出现的每个关键字,那也将非常酷
def keyword(value):
strings = value.split()
if any(word in strings for word in keywords):
return 1
else:
return 0
shopbop['keyword_solution']=shopbop['review_mo'].apply(keyword)
最佳答案
如果您给了我们一个您所引用的DataFrame的示例,那将很有帮助,即使您不会误解您的方法,尽管如此,我仍将尝试如下构成的DataFrame:
import pandas as pd
data={'review_mo':['First hat big hat line with a red color dress',
'Second line color color color and fabric hat',
'Third line without any of those keywords but fabric ',
'Fourth line fabric of big big big hat fabric',
'big big hat hat dress dress fabric fabric color color']}
values=[0,0,0,0,0]
keywords= ["big","hat",'dress',"fabric","color"]
dictionary = dict(zip(keywords, values))
data.update(dictionary)
shopbop=pd.DataFrame(data,columns=['review_mo']+keywords)
数据框和关键字列表必须作为函数参数传递:
def keyword(value,shopbop,keywords):
for key in keywords:
shopbop.loc[shopbop['review_mo']==value,key]=len([x for x in value.split() if x==key])
此块提供您要求的奖励(或某种奖励)以及在每个字符串中找到的关键术语的总价值计数:
shopbop['review_mo'].apply(lambda x: keyword(x,shopbop,keywords))
shopbop['keyword_solution']=shopbop[keywords].sum(axis=1)
显示简历:
shopbop.loc[:, shopbop.columns != 'review_mo']
big hat dress fabric color keyword_solution
0 1 2 1 0 1 5
1 0 1 0 1 3 5
2 0 0 0 1 0 1
3 3 1 0 2 0 6
4 2 2 2 2 2 10