如何使用pandas读取较大的csv文件?

本文介绍了如何使用pandas读取较大的csv文件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在熊猫中读取一个较大的csv文件(大约6 GB)，但出现内存错误:

I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:

MemoryError                               Traceback (most recent call last)
<ipython-input-58-67a72687871b> in <module>()
----> 1 data=pd.read_csv('aphro.csv',sep=';')

...

MemoryError:

对此有任何帮助吗?

推荐答案

该错误表明计算机没有足够的内存来读取整个内存一次将CSV转换为DataFrame.假设您不需要整个数据集一次全部存储一次，避免该问题的一种方法是在以下位置处理CSV块(通过指定chunksize参数):

The error shows that the machine does not have enough memory to read the entireCSV into a DataFrame at one time. Assuming you do not need the entire dataset inmemory all at one time, one way to avoid the problem would be to process the CSV inchunks (by specifying the chunksize parameter):

chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

chunksize参数指定每个块的行数.(当然，最后一块的行数可能少于chunksize.)

The chunksize parameter specifies the number of rows per chunk.(The last chunk may contain fewer than chunksize rows, of course.)

这篇关于如何使用pandas读取较大的csv文件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！