本文介绍了 pandas -带状空白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python csvkit比较2个文件,如下所示:

I am using python csvkit to compare 2 files like this:

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8")
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8")
df3 = pd.merge(df1,df2, on='employee_id', how='right')
df3.to_csv('output.csv', encoding='utf-8', index=False)

当前,我正在通过脚本运行文件,该脚本会先删除employee_id列中的空格.

Currently I am running the file through a script before hand that strips spaces from the employee_id column.

employee_id s的示例:

37 78973 3
23787
2 22 3
123

有没有办法让csvkit做到这一点并为我节省一个步骤?

Is there a way to get csvkit to do it and save me a step?

推荐答案

您可以使用:

You can strip() an entire Series in Pandas using .str.strip():

df1['employee_id'] = df1['employee_id'].str.strip()
df2['employee_id'] = df2['employee_id'].str.strip()

这将删除df1df2

或者,您可以修改read_csv行以也使用 skipinitialspace=True

Alternatively, you can modify your read_csv lines to also use skipinitialspace=True

df1 = pd.read_csv('input1.csv', sep=',\s+', delimiter=',', encoding="utf-8", skipinitialspace=True)
df2 = pd.read_csv('input2.csv', sep=',\s,', delimiter=',', encoding="utf-8", skipinitialspace=True)


您似乎正在尝试删除包含数字的字符串中的空格.您可以通过以下方式做到这一点:


It looks like you are attempting to remove spaces in a string containing numbers. You can do this by:

df1['employee_id'] = df1['employee_id'].str.replace(" ","")
df2['employee_id'] = df2['employee_id'].str.replace(" ","")

这篇关于 pandas -带状空白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 16:44