使用python从文件中读取行

本文介绍了使用python从文件中读取行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个将近 100000 行的文件.我想做一个清理过程(小写，删除停用词等)但是需要时间.

I have a file with almost 100000 lines. I want to make a cleanning process (lower case, remove stopwords etc) However it takes time.

以 10000 为例，脚本需要 15 分钟.对于所有文件，我预计需要 150 分钟.但是需要5个小时.

Example for 10000 the script needs 15 minutes. For all file I expect to take 150 minutes. However it takes 5 hours.

在启动文件时使用:

fileinput = open('tweets.txt', 'r')

lines = fileinput.read().lower() #for lower case, however it load all file

for line in fileinput:
    lines = line.lower()

问题:我可以使用一种方法来读取前 10000 行进行清理的过程，然后再阅读下一行博客等吗?

Question: Can I use a way to read the first 10000 lines making the process of cleaning and after that reading the next blog of lines etc?

推荐答案

我强烈建议逐行操作，而不是一次读取整个文件(换句话说，不要使用 .read()).

I would highly suggest operating line-by-line instead of reading in the entire file all at once (in other words, don't use .read()).

with open('tweets.txt', 'r') as fileinput:
    for line in fileinput:
        line = line.lower()
        # ... do something with line ...
        # (for example, write the line to a new file, or print it)

此将自动利用 Python 的内置缓冲功能.

这篇关于使用python从文件中读取行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！