本文介绍了substr在dplyr%>%mutate中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
pcd <- data.frame(tripNo = c(618, 618, 610, 610, 610, 619),
procDate = as.Date(c('2016-03-02', '2016-03-03', '2016-03-02', '2016-03-03', '2016-03-02', '2016-03-03')),
delay = c(7.45, 12.90, 11.88, 6.66, 12.50, 9.41) )
I要标记在两天不同日期处理的行程不一致,其中第二天的延迟比前一天的延迟更短。我现在这样做:
I want to flag inconsistencies in trips processed on two different days where the delay for the second day is shorter than the last one on the previous day. I have now done it this way:
pcd %>%
arrange(tripNo, procDate, delay) %>%
group_by(tripNo) %>%
mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
Alert = ifelse(delayErr, '!', '')) %>%
select(tripNo, procDate, delay, delayErr, Alert)
tripNo procDate delay delayErr Alert
(dbl) (date) (dbl) (lgl) (chr)
1 610 2016-03-02 11.88 FALSE
2 610 2016-03-02 12.50 FALSE
3 610 2016-03-03 6.66 TRUE !
4 618 2016-03-02 7.45 FALSE
5 618 2016-03-03 12.90 FALSE
6 619 2016-03-03 9.41 FALSE
所以这样可以,我的问题是关于我的第一次尝试,其中我尝试使用substr:
So this works OK, my question is about my first attempt, in which I tried to use substr:
pcd %>% arrange(tripNo, procDate, delay) %>%
group_by(tripNo) %>%
mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
Alert = substr(' !', delayErr + 1, delayErr + 1) ) %>% # <<< This is the only change
select(tripNo, procDate, delay, delayErr, Alert)
tripNo procDate delay delayErr Alert
(dbl) (date) (dbl) (lgl) (chr)
1 610 2016-03-02 11.88 FALSE
2 610 2016-03-02 12.50 FALSE
3 610 2016-03-03 6.66 TRUE
4 618 2016-03-02 7.45 FALSE
5 618 2016-03-03 12.90 FALSE
6 619 2016-03-03 9.41 FALSE
使用此代码,警报不会按预期显示。
有人向我解释为什么第二个dplyr查询不起作用?
谢谢!
With this code, the Alert does not show as I expected.Could someone explain to me why the second dplyr query doesn't work?
Thanks!
推荐答案
已经有一个向量化版本的 substr
ie substring
There is already a vectorized version of substr
i.e. substring
pcd %>%
arrange(tripNo, procDate, delay) %>%
group_by(tripNo) %>%
mutate(delayErr = (row_number() != 1) & (delay < lag(delay)),
Alert = substring(' !', delayErr +1, delayErr +1)) %>%
select(tripNo, procDate, delay, delayErr, Alert)
# tripNo procDate delay delayErr Alert
# (dbl) (date) (dbl) (lgl) (chr)
#1 610 2016-03-02 11.88 FALSE
#2 610 2016-03-02 12.50 FALSE
#3 610 2016-03-03 6.66 TRUE !
#4 618 2016-03-02 7.45 FALSE
#5 618 2016-03-03 12.90 FALSE
#6 619 2016-03-03 9.41 FALSE
这篇关于substr在dplyr%>%mutate中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!