问题描述
我有一个庞大的数据集,我正在尝试逐行读取它.现在,我正在使用pandas读取数据集:
I have a huge dataset and I am trying to read it line by line.For now, I am reading the dataset using pandas:
df = pd.read_csv("mydata.csv", sep =',', nrows = 1)
此功能仅允许我读取第一行,但是如何读取第二行,第三行,依此类推?(我想用熊猫.)
This function allows me to read only the first line, but how can I read the second, the third one and so on?(I would like to use pandas.)
为了更加清楚,我需要一次读取一行,因为数据集为20 GB,并且我无法将所有内容都保留在内存中.
To make it more clear, I need to read one line at a time as the dataset is 20 GB and I cannot keep all the stuff in memory.
推荐答案
在pandas文档中,有一个read_csv函数的参数:
Looking in the pandas documentation, there is a parameter for read_csv function:
skiprows
如果为该参数分配了一个列表,它将跳过该列表索引的行:
If a list is assigned to this parameter it will skip the line indexed by the list:
skiprows = [0,1]
这将跳过第一行和第二行.因此,nrow
和skiprows
的组合允许分别读取数据集中的每一行.
This will skip the first one and the second line.Thus a combination of nrow
and skiprows
allow to read each line in the dataset separately.
这篇关于如何使用 pandas 读取csv中的特定行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!