问题描述
我试图在熊猫中读取一个较大的csv文件(大约6 GB),但出现内存错误:
I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:
MemoryError Traceback (most recent call last)
<ipython-input-58-67a72687871b> in <module>()
----> 1 data=pd.read_csv('aphro.csv',sep=';')
...
MemoryError:
对此有任何帮助吗?
推荐答案
该错误表明计算机没有足够的内存来读取整个内存一次将CSV转换为DataFrame.假设您不需要整个数据集一次全部存储一次,避免该问题的一种方法是在以下位置处理CSV块(通过指定chunksize
参数):
The error shows that the machine does not have enough memory to read the entireCSV into a DataFrame at one time. Assuming you do not need the entire dataset inmemory all at one time, one way to avoid the problem would be to process the CSV inchunks (by specifying the chunksize
parameter):
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
chunksize
参数指定每个块的行数.(当然,最后一块的行数可能少于chunksize
.)
The chunksize
parameter specifies the number of rows per chunk.(The last chunk may contain fewer than chunksize
rows, of course.)
这篇关于如何使用pandas读取较大的csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!