问题描述
我有两列的Pandas数据框.一个是唯一标识符,第二个是附加到此唯一标识符的产品名称.我有重复的标识符和产品名称值.我想将一列产品名称转换为几列而不重复标识符.也许我需要通过标识符来汇总产品名称.
I have Pandas dataframe with two columns. One is unique identifier and second is the name of product attached to this unique identifier. I have duplicate values for identifier and product names. I want to convert one column of product names into several columns without duplicating identifier. Maybe I need to aggregate product names through identifier.
我的数据框如下:
ID Product_Name
100 Apple
100 Banana
200 Cherries
200 Apricots
200 Apple
300 Avocados
我想拥有这样的数据框:
ID
100 Apple Banana
200 Cherries Apricots Apple
300 Avocados
每个标识符旁边的每个产品都必须在单独的列中
Each product along each identifier has to be in separate column
我尝试了pd.melt
,pd.pivot
,pd.pivot_table
,但只有错误,并且此错误表示No numeric types to aggregate
I tried pd.melt
, pd.pivot
, pd.pivot_table
but only errors and this errors says No numeric types to aggregate
有什么想法吗?
推荐答案
使用 cumcount
,用于将MultiIndex的新列名.DataFrame.set_index.html"rel =" nofollow noreferrer> set_index
并通过 unstack
:
Use cumcount
for new columns names to MultiIndex
by set_index
and reshape by unstack
:
df = df.set_index(['ID',df.groupby('ID').cumcount()])['Product_Name'].unstack()
或通过构造器创建list
的Series
和新的DataFrame
:
Or create Series
of list
s and new DataFrame
by contructor:
s = df.groupby('ID')['Product_Name'].apply(list)
df = pd.DataFrame(s.values.tolist(), index=s.index)
print (df)
0 1 2
ID
100 Apple Banana NaN
200 Cherries Apricots Apple
300 Avocados NaN NaN
但是如果要2列DataFrame
:
df1 = df.groupby('ID')['Product_Name'].apply(' '.join).reset_index(name='new')
print (df1)
ID new
0 100 Apple Banana
1 200 Cherries Apricots Apple
2 300 Avocados
这篇关于在Pandas数据框中将列转换为行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!