问题描述
我有一个如下所示的数据框
I have a dataframe like given below
df = pd.DataFrame({
'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],
'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]
})
df['fake_flag'] = ''
在此操作中,我正在执行以下代码所示的操作.这段代码可以正常工作并产生预期的输出,但是我无法将这种方法用于真实的数据集,因为它具有超过一百万条记录.
In this operation, I am performing an operation as shown below in code. This code works fine and produces expected output but I can't use this approach for a real dataset as it has more than million records.
t1 = df['PEEP']
for i in t1.index:
if i >=2:
print("current value is ", t1[i])
print("preceding 1st (n-1) ", t1[i-1])
print("preceding 2nd (n-2) ", t1[i-2])
if (t1[i-1] == t1[i-2] or t1[i-2] >= t1[i-1]):
r1_output = t1[i-2] # we get the max of these two values (t1[i-2]), it doesn't matter when it's constant(t1[i-2] or t1[i-1]) will have the same value anyway
print("rule 1 output is ", r1_output)
if t1[i] >= r1_output + 3:
print("found a value for rule 2", t1[i])
print("check for next value is same as current value", t1[i+1])
if (t1[i]==t1[i+1]):
print("fake flag is being set")
df['fake_flag'][i] = 'fake_vac'
但是,我不能将其应用于真实数据,因为它有超过一百万条记录.我正在学习Python,您能帮助我了解如何在Python中向量化我的代码吗?
However, I can't apply this to real data as it has more than million records. I am learning Python and can you help me understand how to vectorize my code in Python?
您可以参考此帖子以了解其逻辑.掌握正确的逻辑后,我创建了此帖子,主要是为了寻求帮助矢量化和固定我的代码
You can refer this post related post to understand the logic. As I have got the logic right, I have created this post mainly to seek help in vectorizing and fastening my code
我希望我的输出如下所示
I expect my output to be like as shown below
subject_id = 1
subject_id = 2
是否有任何有效且优雅的方法来固定我的代码操作以处理一百万条记录数据集
Is there any efficient and elegant way to fasten my code operation for a million records dataset
推荐答案
不知道背后的故事是什么,但是您当然可以将三个if
独立地矢量化并将它们组合在一起,
Not sure what's the story behind this, but you can certainly vectorize three if
independently and combine them together,
con1 = t1.shift(2).ge(t1.shift(1))
con2 = t1.ge(t1.shift(2).add(3))
con3 = t1.eq(t1.shift(-1))
df['fake_flag']=np.where(con1 & con2 & con3,'fake VAC','')
编辑(Groupby SubjectID)
con = lambda x: (x.shift(2).ge(x.shift(1))) & (x.ge(x.shift(2).add(3))) & (x.eq(x.shift(-1)))
df['fake_flag'] = df.groupby('subject_id')['PEEP'].transform(con).map({True:'fake VAC',False:''})
这篇关于如何在Python中使用嵌套的if和loops对代码进行矢量化处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!