本文介绍了使用Pandas将CSV读取到具有不同行长的dataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
所以我有一个看起来像这样的CSV:
So I have a CSV that looks a bit like this:
1 | 01-01-2019 | 724
2 | 01-01-2019 | 233 | 436
3 | 01-01-2019 | 345
4 | 01-01-2019 | 803 | 933 | 943 | 923 | 954
5 | 01-01-2019 | 454
...
当我尝试使用以下代码生成dataFrame时.
And when I try to use the following code to generate a dataFrame..
df = pd.read_csv('data.csv', header=0, engine='c', error_bad_lines=False)
它仅将3列的行添加到df(上方的第1、3和5行)
It only adds rows with 3 columns to the df (rows 1, 3 and 5 from above)
其余的被认为是坏线",给我以下错误:
The rest are considered 'bad lines' giving me the following error:
Skipping line 17467: expected 3 fields, saw 9
如何创建一个在csv中包含所有数据的数据框,可能只是将空单元格填充为null?还是在添加到df之前必须声明最大行长?
How do I create a data frame that includes all data in my csv, possibly just filling in the empty cells with null? Or do I have to declare the max row length prior to adding to the df?
谢谢!
推荐答案
如果仅使用pandas
,请逐行读取,然后处理分隔符.
If using only pandas
, read in lines, deal with the separator after.
import pandas as pd
df = pd.read_csv('data.csv', header=None, sep='\n')
df = df[0].str.split('\s\|\s', expand=True)
0 1 2 3 4 5 6
0 1 01-01-2019 724 None None None None
1 2 01-01-2019 233 436 None None None
2 3 01-01-2019 345 None None None None
3 4 01-01-2019 803 933 943 923 954
4 5 01-01-2019 454 None None None None
这篇关于使用Pandas将CSV读取到具有不同行长的dataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!