问题描述
我有一个这样的特征向量:
I have a feature vector like this:
rest_id qtr cooking cleaning eating jumping
1 123 1 FALSE TRUE FALSE FALSE
2 123 2 FALSE TRUE FALSE FALSE
3 123 3 FALSE TRUE FALSE FALSE
4 123 4 FALSE TRUE FALSE FALSE
5 435 1 FALSE TRUE FALSE FALSE
6 435 2 FALSE TRUE FALSE FALSE
7 435 3 FALSE TRUE FALSE FALSE
8 435 4 FALSE TRUE FALSE FALSE
9 437 1 FALSE TRUE FALSE FALSE
10 437 2 FALSE TRUE FALSE FALSE
11 437 3 FALSE TRUE FALSE TRUE
12 437 4 FALSE TRUE FALSE FALSE
13 439 2 FALSE TRUE TRUE FALSE
还有一个像这样的目标向量:
And a target vector like this:
rest_id qtr target
1 123 1 TRUE
2 123 2 FALSE
3 123 3 FALSE
4 123 4 TRUE
5 123 5 TRUE
6 435 1 TRUE
7 435 2 TRUE
8 435 3 TRUE
9 435 4 FALSE
10 435 5 FALSE
11 437 1 TRUE
12 437 2 TRUE
13 437 3 TRUE
14 437 4 FALSE
15 439 3 FALSE
我想把这两者结合在一起
I want to join these two together such that
功能 Q1 ->目标 Q1Q2
Feature Q1 -> Target Q1Q2
功能 Q2 ->目标 Q2Q3
Feature Q2 -> Target Q2Q3
功能 Q3 ->目标 Q3Q4
Feature Q3 -> Target Q3Q4
功能 Q4 ->目标 Q4Q5
Feature Q4 -> Target Q4Q5
例如,如果特征观察在第 1 季度,我们检查目标向量的第 1 和第 2 季度的 rest_id
和 quarter
:如果它们都为 TRUE,则target 变为 TRUE,如果它们都为 FALSE,则目标变为 FALSE,如果它们为 TRUE 和 FALSE,则目标变为 TRUE.
For example if the feature observation is in quarter 1, we check quarter 1 and 2 of the target vector for that rest_id
and quarter
: if they are both TRUE the target becomes TRUE, if they are both FALSE the target becomes FALSE, and if they are TRUE and FALSE they the target becomes TRUE.
预期的输出如下所示:
rest_id qtr cooking cleaning eating jumping target
123 1 FALSE TRUE FALSE FALSE TRUE
123 2 FALSE TRUE FALSE FALSE FALSE
123 3 FALSE TRUE FALSE FALSE TRUE
123 4 FALSE TRUE FALSE FALSE TRUE
435 1 FALSE TRUE FALSE FALSE TRUE
435 2 FALSE TRUE FALSE FALSE TRUE
435 3 FALSE TRUE FALSE FALSE TRUE
435 4 FALSE TRUE FALSE FALSE FALSE
437 1 FALSE TRUE FALSE FALSE TRUE
437 2 FALSE TRUE FALSE FALSE TRUE
437 3 FALSE TRUE FALSE FALSE TRUE
437 4 FALSE TRUE FALSE FALSE FALSE
由于我提到的复杂逻辑,我无法仅通过 R 中的常规连接来完成此操作.最简单的方法是什么?
I cant do this with just a regular join in R because of the complicated logic I mentioned.What is the easiest way to do this?
谢谢!
在某些情况下,目标不存在一个季度.我添加了一个 rest_id
为 437 的示例.例如,如果特征向量实例是 Q4,我们检查 Q4 和 Q5.Q5 不存在,所以我们只使用 Q4.如果两者都不存在,那么它应该是 NA.
there are some cases where the target doesn't exist for a quarter. I added an example where the rest_id
is 437. If for example the feature vector instance is Q4, we check for Q4 and Q5. Q5 doesn't exist so we just use Q4. If both do not exist then it should be NA.
推荐答案
我想这就是你想要的:
library(dplyr)
dat %>%
complete(qtr, rest_id) %>%
group_by(rest_id) %>%
mutate(target = as.logical(pmax(target, lead(target), na.rm = TRUE))) %>%
right_join(dat2, by = c("rest_id", "qtr")) %>%
relocate(target, .after = last_col()) %>%
arrange(rest_id)
# A tibble: 13 x 7
# Groups: rest_id [4]
qtr rest_id cooking cleaning eating jumping target
<int> <int> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 123 FALSE TRUE FALSE FALSE TRUE
2 2 123 FALSE TRUE FALSE FALSE FALSE
3 3 123 FALSE TRUE FALSE FALSE TRUE
4 4 123 FALSE TRUE FALSE FALSE TRUE
5 1 435 FALSE TRUE FALSE FALSE TRUE
6 2 435 FALSE TRUE FALSE FALSE TRUE
7 3 435 FALSE TRUE FALSE FALSE TRUE
8 4 435 FALSE TRUE FALSE FALSE FALSE
9 1 437 FALSE TRUE FALSE FALSE TRUE
10 2 437 FALSE TRUE FALSE FALSE TRUE
11 3 437 FALSE TRUE FALSE TRUE TRUE
12 4 437 FALSE TRUE FALSE FALSE FALSE
13 2 439 FALSE TRUE TRUE FALSE FALSE
数据:
dat <- structure(list(rest_id = c(123L, 123L, 123L, 123L, 123L, 435L,
435L, 435L, 435L, 435L, 437L, 437L, 437L, 437L, 439L), qtr = c(1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 3L), target = c(TRUE,
FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE,
TRUE, TRUE, FALSE, FALSE)), class = "data.frame", row.names = c(NA,
-15L))
dat2 <- structure(list(rest_id = c(123L, 123L, 123L, 123L, 435L, 435L,
435L, 435L, 437L, 437L, 437L, 437L, 439L), qtr = c(1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,2L), cooking = c(FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE
), cleaning = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), eating = c(FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE), jumping = c(FALSE,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE,
FALSE, FALSE)), class = "data.frame", row.names = c(NA, -13L))
这篇关于用复杂的逻辑连接R中的特征矩阵和目标向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!