本文介绍了 pandas 行过滤器以及特定行和列的划分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框:-
traffic_type date region total_views
desktop 01/04/2018 aug 50
mobileweb 01/04/2018 aug 60
total 01/04/2018 aug 100
desktop 01/04/2018 world 20
mobileweb 01/04/2018 world 30
total 01/04/2018 world 40
我需要按traffic_type,日期,区域,并过滤流量类型为total的行,并在同一行中创建一个desktop_share列,该列为traffic_type == desktop的total_views / traffic_type == total的总视图,此列的其余行均为空白。
I need to group by traffic_type, date, region, and filter the rows with traffic type total and in the same row create a desktop_share column which is total_views of traffic_type==desktop / total views of the traffic_type ==total the rest of the rows are blank for this column.
traffic_type date region total_views desktop_share
desktop 01/04/2018 aug 50
mobileweb 01/04/2018 aug 60
total 01/04/2018 aug 200 0.25
desktop 01/04/2018 world 20
mobileweb 01/04/2018 world 30
total 01/04/2018 world 40 0.5
我有一个行之有效的方法,但是我正在寻找基于numpy或只是熊猫的更精确的
。
我的解决方案:
I have a long approach which works but I am looking for something more precise based on numpy or just pandas.My solution:
df1 = df2.loc[df2.traffic_type == 'desktop']
df1 = df1[['date', 'region', 'total_views']]
df1 = df2.merge(df1, how='left', on=['region', 'date'], suffixes=('', '_desktop'))
df1 = df1.loc[df1.traffic_type == 'total']
df1['desktop_share'] = df1['total_views_desktop'] / df1['total_views']
df1 = df1[['date', 'region', 'desktop_share', 'traffic_type']]
dfinal = df2.merge(df1, how='left', on=['region', 'date', 'traffic_type'])
推荐答案
一个可透视的想法:
df1 = df.pivot_table(index=['date','region'],
columns='traffic_type',
values='total_views',
aggfunc='sum')
print (df1)
traffic_type desktop mobileweb total
date region
01/04/2018 aug 50 60 200
world 20 30 40
df2 = df1['desktop'].div(df1['total']).reset_index(name='desktop_share').assign(traffic_type='total')
df = df.merge(df2, how='left')
print (df)
traffic_type date region total_views desktop_share
0 desktop 01/04/2018 aug 50 NaN
1 mobileweb 01/04/2018 aug 60 NaN
2 total 01/04/2018 aug 200 0.25
3 desktop 01/04/2018 world 20 NaN
4 mobileweb 01/04/2018 world 30 NaN
5 total 01/04/2018 world 40 0.50
另一个 MultiIndex
的想法:
df1 = df.set_index(['traffic_type','date','region'])
a = df1.xs('desktop', drop_level=False).rename({'desktop':'total'})
b = df1.xs('total', drop_level=False)
df = df1.assign(desktop_share = a['total_views'].div(b['total_views'])).reset_index()
print (df)
traffic_type date region total_views desktop_share
0 desktop 01/04/2018 aug 50 NaN
1 mobileweb 01/04/2018 aug 60 NaN
2 total 01/04/2018 aug 200 0.25
3 desktop 01/04/2018 world 20 NaN
4 mobileweb 01/04/2018 world 30 NaN
5 total 01/04/2018 world 40 0.50
这篇关于 pandas 行过滤器以及特定行和列的划分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!