问题描述
我在DataFrame中有一列,其中包含类别列表.例如:
I have a column in DataFrame containing list of categories. For example:
0 [Pizza]
1 [Mexican, Bars, Nightlife]
2 [American, New, Barbeque]
3 [Thai]
4 [Desserts, Asian, Fusion, Mexican, Hawaiian, F...
6 [Thai, Barbeque]
7 [Asian, Fusion, Korean, Mexican]
8 [Barbeque, Bars, Pubs, American, Traditional, ...
9 [Diners, Burgers, Breakfast, Brunch]
11 [Pakistani, Halal, Indian]
我正在尝试做两件事:
1)获得唯一的类别-我的方法是设置一个空集合,依次遍历序列并附加每个列表.
1) Get unique categories - My approach is have a empty set, iterate through series and append each list.
我的代码:
unique_categories = {'Pizza'}
for lst in restaurant_review_df['categories_arr']:
unique_categories = unique_categories | set(lst)
这为我提供了列中所有列表中包含的一组唯一类别.
This give me a set of unique categories contained in all the lists in the column.
2)生成类别计数的饼图,每个餐厅可以属于多个类别.例如:餐厅11属于巴基斯坦,印度和清真食品类别.我的方法是再次遍历类别,然后再进行一系列迭代以获取计数.
2) Generate pie plot of category counts and each restaurant can belong to multiple categories. For example: restaurant 11 belongs to Pakistani, Indian and Halal categories. My approach is again iterate through categories and one more iteration through series to get counts.
有没有更简单或更优雅的方法?
Are there simpler or elegant ways of doing this?
谢谢.
推荐答案
使用带有explode
的pandas 0.25.0+更新Update using pandas 0.25.0+ with explode
df['category'].explode().value_counts()
输出:
Barbeque 3
Mexican 3
Fusion 2
Thai 2
American 2
Bars 2
Asian 2
Hawaiian 1
New 1
Brunch 1
Pizza 1
Traditional 1
Pubs 1
Korean 1
Pakistani 1
Burgers 1
Diners 1
Indian 1
Desserts 1
Halal 1
Nightlife 1
Breakfast 1
Name: Places, dtype: int64
并进行绘图:
df['category'].explode().value_counts().plot.pie(figsize=(8,8))
输出:
适用于0.25.0之前的较旧版本的熊猫试试:
For older verions of pandas before 0.25.0Try:
df['category'].apply(pd.Series).stack().value_counts()
输出:
Mexican 3
Barbeque 3
Thai 2
Fusion 2
American 2
Bars 2
Asian 2
Pubs 1
Burgers 1
Traditional 1
Brunch 1
Indian 1
Korean 1
Halal 1
Pakistani 1
Hawaiian 1
Diners 1
Pizza 1
Nightlife 1
New 1
Desserts 1
Breakfast 1
dtype: int64
使用绘图:
df['category'].apply(pd.Series).stack().value_counts().plot.pie()
输出:
每个@coldspeed的评论
Per @coldspeed's comments
from itertools import chain
from collections import Counter
pd.DataFrame.from_dict(Counter(chain(*df['category'])), orient='index').sort_values(0, ascending=False)
输出:
Barbeque 3
Mexican 3
Bars 2
American 2
Thai 2
Asian 2
Fusion 2
Pizza 1
Diners 1
Halal 1
Pakistani 1
Brunch 1
Breakfast 1
Burgers 1
Hawaiian 1
Traditional 1
Pubs 1
Korean 1
Desserts 1
New 1
Nightlife 1
Indian 1
这篇关于从 pandas 系列列表中获取唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!