我有一个pandas数据框A
,列keywords
为:-
keywords
['loans','mercedez','bugatti','a4']
['trump','usa','election','president']
['galaxy','7s','canon','macbook']
['beiber','spiderman','marvels','ironmen']
.........................................
.........................................
.........................................
我还有另一个pandas dataframe
B
,其中columncategory
和words
是逗号分隔的字符串:-category words
audi audi a4,audi a6
bugatti bugatti veyron, bugatti chiron
mercedez mercedez s-class, mercedez e-class
dslr canon, nikon
apple iphone 7s,iphone 6s,iphone 5
finance sales,loans,sales price
politics donald trump, election, votes
entertainment spiderman,captain america, ironmen
music justin beiber, rihana,drake
........ ..............
......... .........
我只想将
dataframe A
列keywords
映射到dataframe B
列words
并分配相应的category
。keywords
列的映射应该与列word
字符串中的每个单词对应。例如:-关键字a4
应与列audi a4
中字符串words
中的两个单词匹配。预期结果为:- keywords matched_category
['loans','mercedez','bugatti','a4'] ['finance','mercedez','mercedez','bugatti','bugatti','audi']
['trump','usa','election','president'] ['politics','politics']
['galaxy','7s','canon','macbook'] ['apple','dslr']
['beiber','spiderman','marvels','ironmen'] ['music','entertaiment','entertainment','entertainment']
最佳答案
一种方法是使用pandas.transform:
import pandas as pd
A = pd.DataFrame({'keywords': [['loans','mercedez','bugatti','a4'],
['trump','usa','election','president']]})
B = pd.DataFrame({'category': ['audi', 'finance'],
'words': ['audi a4,audi a6', 'sales,loans,sales price']})
def match_category_to_keywords(kws):
ret = []
for kw in kws:
m = B['words'].transform(lambda words: any([kw in w for w in words.split(',')]))
ret.extend(B['category'].loc[m].tolist())
return pd.np.unique(ret)
A['matched_category'] = A['keywords'].transform(lambda kws: match_category_to_keywords(kws))
print(A)
输出:
keywords matched_category
0 [loans, mercedez, bugatti, a4] [audi, finance]
1 [trump, usa, election, president] []