如何使分隔符在read_csv中更灵活wrt空格？

本文介绍了如何使分隔符在read_csv中更灵活wrt空格？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要使用存储在文件中的数据创建数据框。为此，我想使用 read_csv 方法。然而，分离器不是很规则。一些列由制表符（ \t ）分隔，其他由空格分隔。此外，一些列可以由2或3或更多个空格分隔，甚至由空格和制表符的组合（例如3个空格，两个制表符，然后1个空格）分隔。

I need to created a data frame using data stored in a file. For that I want to use read_csv method. However, the separator is not very regular. Some columns are separated by tabs (\t), other are separated by spaces. Moreover, some columns can be separated by 2 or 3 or more spaces or even by a combination of spaces and tabs (for example 3 spaces, two tabs and then 1 space).

有没有办法告诉熊猫正确处理这些文件？

Is there a way to tell pandas to treat these files properly?

顺便说一句，如果我使用Python，我没有这个问题。我使用：

By the way, I do not have this problem if I use Python. I use:

for line in file(file_name):
   fld = line.split()

它工作完美。它不关心字段之间是否有2或3个空格。即使空格和制表符的组合也不会引起任何问题。

And it works perfect. It does not care if there are 2 or 3 spaces between the fields. Even combinations of spaces and tabs do not cause any problem. Can pandas do the same?

推荐答案

从，可以使用正则表达式或 delim_whitespace ：

From the documentation, you can use either a regex or delim_whitespace:

>>> import pandas as pd
>>> for line in open("whitespace.csv"):
...     print repr(line)
...
'a\t  b\tc 1 2\n'
'd\t  e\tf 3 4\n'
>>> pd.read_csv("whitespace.csv", header=None, delimiter=r"\s+")
   0  1  2  3  4
0  a  b  c  1  2
1  d  e  f  3  4
>>> pd.read_csv("whitespace.csv", header=None, delim_whitespace=True)
   0  1  2  3  4
0  a  b  c  1  2
1  d  e  f  3  4

这篇关于如何使分隔符在read_csv中更灵活wrt空格？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！