我有一个数据库,由
家庭社区的
blk_h
,blk_w
),Flow
),Med_C
)和CumFlow
)累积的工作人员流。 数据按
blk_h
和blk_w
之间的距离(降序)排序,并按id_h
分组。我需要对数据进行子集化,以提取CumFlow
FIRST等于或超过Med_C
的每个家庭邻居的情况。我尝试了各种dplyr函数,但无法使其正常工作。这是一个例子:
df <- data.frame(
id_h=c("A","A","A","A","B","B","B"),
blk_h=c("A1","A1","A2","A2","B1","B2","B2"),
blk_w=c("W1","W2","W3","W3","W1","W2","W2"),
dist=c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),
Flow=c(3,6,3,7,5,4,2),
CumFlow=c(3,9,12,19,5,9,11),
Med_C=c(10,10,10,10,6,6,6)
)
df
我需要这个来返回这样的表:
id_h blk_h blk_w dist Flow CumFlow Med_C
A A2 W3 7.0 3 12 10
B B2 W2 6.5 4 9 6
以下是我为实现这一目标所做的一些尝试:
尝试#1
library(dplyr)
df.g <- group_by(df, id_h)
df.g2 <- filter(df.g, CumFlow == which.min(CumFlow >= Med_C))
尝试#2
library(data.table)
setDT(df)[, .SD[which.min(CumCount >= Med_C)], by = id_h]
尝试#3
library(dplyr)
test <- df %>% group_by(id_h) %>% filter(min(CumFlow) >= Med_C)
我想我误会了如何使用
which.min
函数。任何意见是极大的赞赏。 最佳答案
两个filter
调用可以解决此问题。
使用group_by
在每个id_h
中工作时,第一个filter
返回data.frame
,其中CumFlow
大于或等于Med_C
的所有行。第二个filter
在每个id_h
中返回CumFlow
最低的行。这仅是因为数据已排序。为了使工作更加健壮,您可以考虑在对arrange
的调用之后添加对group_by
的调用。
library(dplyr)
df <- data.frame(
id_h = c("A","A","A","A","B","B","B"),
blk_h = c("A1","A1","A2","A2","B1","B2","B2"),
blk_w = c("W1","W2","W3","W3","W1","W2","W2"),
dist = c(4.3,5.6,7.0,8.7,5.2,6.5,6.8),
Flow = c(3,6,3,7,5,4,2),
CumFlow = c(3,9,12,19,5,9,11),
Med_C = c(10,10,10,10,6,6,6)
)
df
df %>%
group_by(id_h) %>%
filter(CumFlow >= Med_C) %>%
filter(CumFlow == min(CumFlow))
关于r - 在x首先超过y的组中过滤,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38837909/