问题描述
我有一个包含 13000 行和 3 列的数据框:
I have a data frame with 13000 rows and 3 columns:
('time', 'rowScore', 'label')
我想逐个读取子集:
[[1..360], [360..712], ..., [12640..13000]]
我也使用了列表,但它不起作用:
I used list too but it's not working:
import pandas as pd
import math
import datetime
result="data.csv"
dataSet = pd.read_csv(result)
TP=0
count=0
x=0
df = pd.DataFrame(dataSet, columns =
['rawScore','label'])
for i,row in df.iterrows():
data= row.to_dict()
ScoreX= data['rawScore']
labelX=data['label']
for i in range (1,13000,360):
x=x+1
for j in range (i,360*x,1):
if ((ScoreX > 0.3) and (labelX ==0)):
count=count+1
print("count=",count)
推荐答案
您还可以使用参数 nrows
或 skiprows
将其分解为块.我建议不要使用 iterrows
,因为这通常很慢.如果您在读取值时执行此操作,并分别保存这些块,则会跳过 iterrows 部分.如果您想分成多个块,这是用于读取文件(这似乎是您尝试执行的操作的中间步骤).
You can also use the parameters nrows
or skiprows
to break it up into chunks. I would recommend against using iterrows
since that is typically very slow. If you do this when reading in the values, and saving these chunks separately, then it would skip the iterrows section. This is for the file reading if you want to split up into chunks (which seems to be an intermediate step in what you're trying to do).
另一种方法是通过查看值是否属于每个集合来使用生成器进行子集化:[[1..360], [360..712], ..., [12640..13000]]
Another way is to subset using generators by seeing if the values belong to each set:[[1..360], [360..712], ..., [12640..13000]]
因此编写一个函数,该函数采用索引可被 360 整除的块,如果索引在该范围内,则选择该特定子集.
So write a function that takes the chunks with indices divisible by 360 and if the indices are in that range, then choose that particular subset.
我只是将这些方法写下来作为您可能想要尝试的替代想法,因为在某些情况下,您可能只需要一个子集而不是所有块用于计算目的.
I just wrote these approaches down as alternative ideas you might want to play around with, since in some cases you may only want a subset and not all of the chunks for calculation purposes.
这篇关于如何使用 Pandas 逐个读取 CSV 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!