本文介绍了在 pandas 列重命名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下csv文件数据:
I have the following csv file of data:
id,number,id
132605,1,1
132750,2,1
Pandas当前将其重命名为:
Pandas currently renames this to:
id number id.1
0 132605 1 1
1 132750 2 1
是否可以自定义重命名的方式?例如,我希望:
Is there a way to customize how this is renamed? For example, I would prefer:
id number id2
0 132605 1 1
1 132750 2 1
推荐答案
rename
:使用句点分隔符
假设重复的列标签是仅 实例,其中列名称包含句点(.
),则可以将自定义函数与 pd.DataFrame.rename
:
rename
: use period delimiter
Assuming duplicate column labels are the only instances where a column name contains a period (.
), you can use a custom function with pd.DataFrame.rename
:
from io import StringIO
file = """id,number,id
132605,1,1
132750,2,1"""
def rename_func(x):
if '.' not in x:
return x
name, num = x.split('.')
return f'{name}{int(num)+1}'
# replace StringIO(file) with 'file.csv'
df = pd.read_csv(StringIO(file))\
.rename(columns=rename_func)
print(df)
id number id2
0 132605 1 1
1 132750 2 1
csv.reader
:可靠的解决方案
使用标准库中的csv
模块可以提供可靠的解决方案:
csv.reader
: robust solution
A robust solution is possible with the csv
module from the standard library:
from collections import defaultdict
import csv
# replace StringIO(file) with open('file.csv', 'r')
with StringIO(file) as fin:
headers = next(csv.reader(fin))
def rename_duplicates(original_cols):
count = defaultdict(int)
for x in original_cols:
count[x] += 1
yield f'{x}{count[x]}' if count[x] > 1 else x
df.columns = rename_duplicates(headers)
这篇关于在 pandas 列重命名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!