将pandas列的元素与另一个pandas数据框的列匹配

将pandas列的元素与另一个pandas数据框的列匹配

本文介绍了将pandas列的元素与另一个pandas数据框的列匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框A,列keywords为:-

I have a pandas dataframe A with column keywords as :-

 keywords
 ['loans','mercedez','bugatti','a4']
 ['trump','usa','election','president']
 ['galaxy','7s','canon','macbook']
 ['beiber','spiderman','marvels','ironmen']
 .........................................
 .........................................
 .........................................

我还有另一个熊猫数据框B,其中列categorywords是逗号分隔的字符串,如下所示:-

I also have another pandas dataframe B with column category and words which is comma seperated string as:-

category              words
audi                  audi a4,audi a6
bugatti               bugatti veyron, bugatti chiron
mercedez              mercedez s-class, mercedez e-class
dslr                  canon, nikon
apple                 iphone 7s,iphone 6s,iphone 5
finance               sales,loans,sales price
politics              donald trump, election, votes
entertainment         spiderman,captain america, ironmen
music                 justin beiber, rihana,drake
........              ..............
.........             .........

我要映射dataframe Akeywordsdataframe Bwords并分配相应的category. keywords列的映射应该与列word的字符串中的每个单词匹配.例如:-关键字a4应该与列words的字符串audi a4中的两个单词匹配,预期结果将是:-

All I want to map dataframe A column keywords with dataframe B column words and assign a corresponding category. Mapping of keywords column should be with each word in string of column word. For example:- keyword a4 should be matched with both words in string audi a4 in column words.Expected result would be:-

  keywords                                       matched_category
  ['loans','mercedez','bugatti','a4']            ['finance','mercedez','mercedez','bugatti','bugatti','audi']
  ['trump','usa','election','president']         ['politics','politics']
  ['galaxy','7s','canon','macbook']              ['apple','dslr']
  ['beiber','spiderman','marvels','ironmen']     ['music','entertaiment','entertainment','entertainment']

推荐答案

我希望您可以使用:

#create dictionary by split comma and whitespaces
d = df2.set_index('category')['words'].str.split(',\s*|\s+').to_dict()
#flatten lists to dictionary
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'audi': 'audi', 'a4': 'audi', 'a6': 'audi', 'bugatti': 'bugatti',
 'veyron': 'bugatti', 'chiron': 'bugatti', 'mercedez': 'mercedez',
 's-class': 'mercedez', 'e-class': 'mercedez', 'canon': 'dslr',
 'nikon': 'dslr', 'iphone': 'apple', '7s': 'apple', '6s': 'apple',
 '5': 'apple', 'sales': 'finance', 'loans': 'finance', 'price': 'finance',
 'donald': 'politics', 'trump': 'politics', 'election': 'politics',
 'votes': 'politics', 'spiderman': 'entertainment', 'captain': 'entertainment',
 'america': 'entertainment', 'ironmen': 'entertainment', 'justin': 'music',
 'beiber': 'music', 'rihana': 'music', 'drake': 'music'}


#for each value map in nested list comprehension
df1['new'] = [[d1.get(y, None) for y in x if y in d1] for x in df1['keywords']]
print (df1)
                                keywords  \
0         [loans, mercedez, bugatti, a4]
1      [trump, usa, election, president]
2           [galaxy, 7s, canon, macbook]
3  [beiber, spiderman, marvels, ironmen]

                                     new
0     [finance, mercedez, bugatti, audi]
1                   [politics, politics]
2                          [apple, dslr]
3  [music, entertainment, entertainment]

这篇关于将pandas列的元素与另一个pandas数据框的列匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 15:32