我有一个pandas数据框A,列keywords为:-

 keywords
 ['loans','mercedez','bugatti','a4']
 ['trump','usa','election','president']
 ['galaxy','7s','canon','macbook']
 ['beiber','spiderman','marvels','ironmen']
 .........................................
 .........................................
 .........................................

我还有另一个pandas dataframeB,其中columncategorywords是逗号分隔的字符串:-
category              words
audi                  audi a4,audi a6
bugatti               bugatti veyron, bugatti chiron
mercedez              mercedez s-class, mercedez e-class
dslr                  canon, nikon
apple                 iphone 7s,iphone 6s,iphone 5
finance               sales,loans,sales price
politics              donald trump, election, votes
entertainment         spiderman,captain america, ironmen
music                 justin beiber, rihana,drake
........              ..............
.........             .........

我只想将dataframe Akeywords映射到dataframe Bwords并分配相应的categorykeywords列的映射应该与列word字符串中的每个单词对应。例如:-关键字a4应与列audi a4中字符串words中的两个单词匹配。预期结果为:-
  keywords                                       matched_category
  ['loans','mercedez','bugatti','a4']            ['finance','mercedez','mercedez','bugatti','bugatti','audi']
  ['trump','usa','election','president']         ['politics','politics']
  ['galaxy','7s','canon','macbook']              ['apple','dslr']
  ['beiber','spiderman','marvels','ironmen']     ['music','entertaiment','entertainment','entertainment']

最佳答案

一种方法是使用pandas.transform:

import pandas as pd

A = pd.DataFrame({'keywords': [['loans','mercedez','bugatti','a4'],
                           ['trump','usa','election','president']]})
B = pd.DataFrame({'category': ['audi', 'finance'],
                  'words': ['audi a4,audi a6', 'sales,loans,sales price']})

def match_category_to_keywords(kws):
    ret = []
    for kw in kws:
        m = B['words'].transform(lambda words: any([kw in w for w in words.split(',')]))
        ret.extend(B['category'].loc[m].tolist())
    return pd.np.unique(ret)

A['matched_category'] = A['keywords'].transform(lambda kws: match_category_to_keywords(kws))
print(A)

输出:
                            keywords matched_category
0     [loans, mercedez, bugatti, a4]  [audi, finance]
1  [trump, usa, election, president]               []

07-24 09:52
查看更多