我有一个包含名字,颜色,重量,大小,种子的水果数据集
Fruit dataset
Name Colour Weight Size Seeds Unnamed
Apple Apple Red 10.0 Big Yes
Apple Apple Red 5.0 Small Yes
Pear Pear Green 11.0 Big Yes
Banana Banana Yellow 4.0 Small Yes
Orange Orange Orange 5.0 Small Yes
问题是,颜色列是名称的重复列,值向右移动1列,创建一个无用的列(未命名),其中包含属于列种子的值是否有一种简单的方法可以删除颜色中的重复值,并将其余列值从“权重”开始向左移1列。我希望我没有把这里的人弄糊涂。
渴望的结果
Fruit dataset
Name Colour Weight Size Seeds Unnamed(will be dropped)
Apple Red 10.0 Big Yes
Apple Red 5.0 Small Yes
Pear Green 11.0 Big Yes
Banana Yellow 4.0 Small Yes
Orange Orange 5.0 Small Yes
最佳答案
你可以这样做:
In [23]: df
Out[23]:
Name Colour Weight Size Seeds Unnamed
0 Apple Apple Red 10.0 Big Yes
1 Apple Apple Red 5.0 Small Yes
2 Pear Pear Green 11.0 Big Yes
3 Banana Banana Yellow 4.0 Small Yes
4 Orange Orange Orange 5.0 Small Yes
In [24]: cols = df.columns[:-1]
In [25]: cols
Out[25]: Index(['Name', 'Colour', 'Weight', 'Size', 'Seeds'], dtype='object')
In [26]: df = df.drop('Colour', 1)
In [27]: df.columns = cols
In [28]: df
Out[28]:
Name Colour Weight Size Seeds
0 Apple Red 10.0 Big Yes
1 Apple Red 5.0 Small Yes
2 Pear Green 11.0 Big Yes
3 Banana Yellow 4.0 Small Yes
4 Orange Orange 5.0 Small Yes