我有一个数据框和一本字典:
news = {'Text':['dog ate the apple', 'cat ate the carrot', 'dog drank water'], 'Source':['NYT', 'WP', 'Guardian']}
news_df = pd.DataFrame(news)
w = {1:['horse', 'dog'], 2:['apple'], 10: ['water', 'melon', 'liquerice']}
我想创建一个新的列news_df ['sum'],它会查看news_df ['Text'],检查是否有任何词典值可用,并且如果这些列中有1个或多个,则分配键的总和。我的结果将是:
results = {'Text':['dog ate the apple', 'cat ate the carrot', 'dog drank water'], 'Source':['NYT', 'WP', 'Guardian'], 'sum' : [3, 0, 11]}
results_df = pd.DataFrame(results)
任何想法如何?我不确定采取什么方法?也许把字典变成一个数据框?
最佳答案
这是一种应用方法:
def counts(x):
sumcount = 0
for k, v in w.items():
for word in v:
if word in x:
sumcount+=int(k)
return sumcount
news_df.Text.apply(counts)
Text Source sum
0 dog ate the apple NYT 3
1 cat ate the carrot WP 0
2 dog drank water Guardian 11