在 Python 中解析制表符分隔的文件

本文介绍了在 Python 中解析制表符分隔的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 Python 中解析一个制表符分隔的文件，其中一个数字与行的开头分开放置 k 个制表符，应该放入第 k 个数组中.

I'm trying to parse a tab-separated file in Python where a number placed k tabs apart from the beginning of a row, should be placed into the k-th array.

除了逐行读取并执行幼稚解决方案会执行的所有明显处理之外，是否有内置函数或更好的方法来执行此操作?

Is there a built-in function to do this, or a better way, other than reading line by line and do all the obvious processing a naive solution would perform?

推荐答案

您可以使用csv 模块轻松解析制表符分隔值文件.

You can use the csv module to parse tab seperated value files easily.

import csv

with open("tab-separated-values") as tsv:
    for line in csv.reader(tsv, dialect="excel-tab"): #You can also use delimiter="	" rather than giving a dialect.
        ...

其中 line 是每次迭代的当前行上的值的列表.

Where line is a list of the values on the current row for each iteration.

正如下面所建议的，如果你想按列而不是按行阅读，那么最好的办法是使用 zip() 内置:

As suggested below, if you want to read by column, and not by row, then the best thing to do is use the zip() builtin:

with open("tab-separated-values") as tsv:
    for column in zip(*[line for line in csv.reader(tsv, dialect="excel-tab")]):
        ...

这篇关于在 Python 中解析制表符分隔的文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！