本文介绍了将未知分隔符的.csv加载到Pandas DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多.csv文件要加载到熊猫数据框中,至少有两个定界符逗号和半冒号,而我不确定其余的定界符.我知道可以使用

I have many .csv files that are to be loaded into pandas data-frames, there are at a minimum two delimiters comma and semi colon, and I am unsure of the rest of the delimiters. I understand that the delimeter can be set using

dataRaw = pd.read_csv(name,sep=",")

dataRaw = pd.read_csv(name,sep=";")

不幸的是,如果我不指定定界符,则默认值为逗号,这将导致其他定界符的单列数据帧.因此,有没有一种动态的方式来分配定界符,以便任何csv可以传递给熊猫?例如尝试逗号或分号.熊猫文档并未暗示在csv read中使用逻辑

unfortunately if I was to not specify a delimiter the default is comma which results in a single column data frame for other delimiters. thus is there a dynamic way to allocate a delimiter so that any csv can be passed to pandas? such as try comma or semicolon. The pandas documentation doesn't allude to the use of logic in the csv read

推荐答案

如果您使用其他分隔符,则可以使用:

If you have different separators you can use:

dataRaw = pd.read_csv(name,sep=";|,")

正则表达式,它可以处理多个用OR分隔的分隔符( |)运算符.

is a Regular expression that can handle multiple separators divided by the OR (|) operator.

这篇关于将未知分隔符的.csv加载到Pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 15:06