问题描述
我有一个数据框,其中每个用户加入我的网站并进行购买的行都有一行.
I have a data frame which has rows for each user joining my site and making a purchase.
+---+-----+--------------------+---------+--------+-----+
| | uid | msg | _time | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 | 1 | confirmed_settings | 1/29/15 | M | 37 |
| 1 | 1 | sale | 4/13/15 | M | 37 |
| 2 | 3 | confirmed_settings | 4/19/15 | M | 35 |
| 3 | 4 | confirmed_settings | 2/21/15 | M | 21 |
| 4 | 5 | confirmed_settings | 3/28/15 | M | 18 |
| 5 | 4 | sale | 3/15/15 | M | 21 |
+---+-----+--------------------+---------+--------+-----+
我想更改数据帧,以使每行对于一个uid都是唯一的,并且有一个名为sale
和confirmed_settings
的列,这些列具有操作的时间戳.请注意,并非每个用户都有一个sale
,但是每个用户都有一个confirmed_settings
.如下所示:
I would like to change the dataframe so that each row is unique for a uid and there is a columns called sale
and confirmed_settings
which have the timestamp of the action. Note that not every user has a sale
, but every user has a confirmed_settings
. Like below:
+---+-----+--------------------+---------+---------+--------+-----+
| | uid | confirmed_settings | sale | _time | gender | age |
+---+-----+--------------------+---------+---------+--------+-----+
| 0 | 1 | 1/29/15 | 4/13/15 | 1/29/15 | M | 37 |
| 1 | 3 | 4/19/15 | null | 4/19/15 | M | 35 |
| 2 | 4 | 2/21/15 | 3/15/15 | 2/21/15 | M | 21 |
| 3 | 5 | 3/28/15 | null | 3/28/15 | M | 18 |
+---+-----+--------------------+---------+---------+--------+-----+
为此,我正在尝试:
df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index()
df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')
但我收到此错误:ValueError: Index contains duplicate entries, cannot reshape
如何旋转具有重复索引值的df来转换我的数据框?
How can I pivot a df with duplicate index values to transform my dataframe?
df1 = df.pivot_table(index='uid', columns='msg', values='_time').reset_index()
给出此错误DataError: No numeric types to aggregate
,但我什至不确定这是否是正确的方法.
gives this error DataError: No numeric types to aggregate
but im not even sure that is the right path to go on.
推荐答案
x
是输入的数据框:
uid msg _time gender age
0 1 confirmed_settings 1/29/15 M 37
1 1 sale 4/13/15 M 37
2 3 confirmed_settings 4/19/15 M 35
3 4 confirmed_settings 2/21/15 M 21
4 5 confirmed_settings 3/28/15 M 18
5 4 sale 3/15/15 M 21
y = x.pivot(index='uid', columns='msg', values='_time')
x.join(y).drop('msg', axis=1)
给您
uid _time gender age confirmed_settings sale
0 1 1/29/15 M 37 NaN NaN
1 1 4/13/15 M 37 1/29/15 4/13/15
2 3 4/19/15 M 35 NaN NaN
3 4 2/21/15 M 21 4/19/15 NaN
4 5 3/28/15 M 18 2/21/15 3/15/15
5 4 3/15/15 M 21 3/28/15 NaN
这篇关于使用重复的索引值旋转pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!