使用Pandas读取大型文本文件

本文介绍了使用Pandas读取大型文本文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直试图用Pandas读取一些大文本文件（大小约为1.4GB - 2GB），使用 read_csv 函数，但没有用。以下是我正在使用的版本：

I have been trying to read a few large text files (sizes around 1.4GB - 2GB) with Pandas, using the read_csv function, with no avail. Below are the versions I am using:

Python 2.7.6

Anaconda 1.9.2（ 64位）（默认，2013年11月11日，10：49：15）[MSC v.1500 64位（AMD64）]

IPython 1.1.0

Pandas 0.13.1

Python 2.7.6
Anaconda 1.9.2 (64-bit) (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
IPython 1.1.0
Pandas 0.13.1

我尝试了以下内容：

df = pd.read_csv(data.txt')

它崩溃了Ipython的消息：内核死了，重启。

and it crashed Ipython with a message: Kernel died, restarting.

然后我尝试使用迭代器：

Then I tried using an iterator:

tp = pd.read_csv('data.txt', iterator = True, chunksize=1000)

再次，我得到 Kernel死了，重启错误。

任何想法？或者以其他方式阅读大文本文件？

Any ideas? Or any other way to read big text files?

谢谢！

推荐答案

发布此问题后的某个时间。基本上，它建议通过执行以下操作来读取块中的文件：

A solution for a similar question was given here some time after the posting of this question. Basically, it suggests to read the file in chunks by doing the following:

chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

您应该根据机器的功能指定 chunksize 参数（即确保它可以处理块）。

You should specify the chunksize parameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).

这篇关于使用Pandas读取大型文本文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！