问题描述
我在df中有几列相同的名称.需要重命名它们.通常的重命名将全部重命名无论如何,我可以将下面的blah重命名为blah1,blah4,blah5吗?
i have several columns named the same in a df. Need to rename them. The usual rename renames the allanyway I can rename the below blah(s) to blah1, blah4, blah5?
In [6]:
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df
Out[6]:
blah blah2 blah3 blah blah
0 0 1 2 3 4
1 5 6 7 8 9
在[7]中:
df.rename(columns = {'blah':'blah1'})
Out[7]:
blah1 blah2 blah3 blah1 blah1
0 0 1 2 3 4
1 5 6 7 8 9
推荐答案
我希望在Pandas中找到比一般的Python解决方案更多的解决方案.如果Column的get_loc()函数找到带有"True"值的重复项,则该掩码数组将返回掩码数组,"True"值指向找到重复项的位置.然后,我使用掩码将新值分配到这些位置.在我的情况下,我提前知道我要获得多少个dups,以及我将分配给他们什么,但是看起来df.columns.get_duplicates()会返回所有dups的列表,然后您就可以如果您需要更通用的重复除草操作,请将该列表与get_loc()结合使用
I was looking to find a solution within Pandas more than a general Python solution.Column's get_loc() function returns a masked array if it finds duplicates with 'True' values pointing to the locations where duplicates are found. I then use the mask to assign new values into those locations. In my case, I know ahead of time how many dups I'm going to get and what I'm going to assign to them but it looks like df.columns.get_duplicates() would return a list of all dups and you can then use that list in conjunction with get_loc() if you need a more generic dup-weeding action
cols=pd.Series(df.columns)
for dup in df.columns.get_duplicates():
cols[df.columns.get_loc(dup)] = ([dup + '.' + str(d_idx)
if d_idx != 0
else dup
for d_idx in range(df.columns.get_loc(dup).sum())]
)
df.columns=cols
blah blah2 blah3 blah.1 blah.2
0 0 1 2 3 4
1 5 6 7 8 9
更好的新方法(更新03/12/2019)
下面的这段代码比上面的代码更好.从下面的另一个答案(@SatishSK)复制:
This code below is better than above code. Copied from another answer below (@SatishSK):
#sample df with duplicate blah column
df=pd.DataFrame(np.arange(2*5).reshape(2,5))
df.columns=['blah','blah2','blah3','blah','blah']
df
# you just need the following 4 lines to rename duplicates
# df is the dataframe that you want to rename duplicated columns
cols=pd.Series(df.columns)
for dup in cols[cols.duplicated()].unique():
cols[cols[cols == dup].index.values.tolist()] = [dup + '.' + str(i) if i != 0 else dup for i in range(sum(cols == dup))]
# rename the columns with the cols list.
df.columns=cols
df
输出:
blah blah2 blah3 blah.1 blah.2
0 0 1 2 3 4
1 5 6 7 8 9
这篇关于 pandas 的DataFrame-重命名多个相同名称的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!