问题描述
我有一些看起来像这样的数据
I have some data which looks like:
grp date id Y
<chr> <dttm> <chr> <dbl>
1 group1 2020-09-01 00:00:00 04003 17039.
2 group1 2020-09-01 00:00:00 04006 13233.
3 group1 2020-09-01 00:00:00 04011_AM 7918.
4 group1 2020-09-01 00:00:00 0401301_AD 22586.
5 group1 2020-09-01 00:00:00 0401303 20527.
6 group1 2020-09-01 00:00:00 0401305 29422.
7 group2 2020-09-01 00:00:00 22017_AM 7088.
8 group2 2020-09-01 00:00:00 22021_AM 8134.
9 group2 2020-09-01 00:00:00 22039_AM 15842.
10 group2 2020-09-01 00:00:00 22048 16142.
其中有不同的组.我还有一个功能:
Which has different groups. I also have a function:
normaliseData <-function(m){
(m - min(m)) / (max(m) - min(m))
}
我想通过成对值的最小值和最大值对组进行归一化,并保持 group1
不变.也就是说,我要对固定 group1
的数据进行归一化,因此它将具有以下组合.
I want to normalise the groups by the min and max of the pairwise values, holding group1
fixed. That is, I want to normalise the data fixing group1
so it will have the following combinations.
-
group1
&group2
-
group1
&group3
-
group1
&group4
group1
&group2
group1
&group3
group1
&group4
数据:
data <- structure(list(grp = c("group1", "group1", "group1", "group1",
"group1", "group1", "group2", "group2", "group2", "group2", "group2",
"group2", "group3", "group3", "group3", "group3", "group3", "group3",
"group4", "group4", "group4", "group4", "group4", "group4"),
date = structure(c(1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), id = c("04003",
"04006", "04011_AM", "0401301_AD", "0401303", "0401305",
"22017_AM", "22021_AM", "22039_AM", "22048", "22053_AM",
"22054_AM", "28002", "28004", "2800501", "2800502", "2800503",
"2800504", "31010_AM", "31015_AM", "31016", "31019_AM", "31023",
"31029_AM"), Y = c(17039.329, 13232.982, 7917.693, 22585.676,
20527.113, 29422.471, 7087.536, 8134.265, 15842.035, 16142.111,
11493.981, 6556.387, 22086.768, 11325.882, 53449.067, 83662.101,
78508.089, 66107.125, 5095.169, 5590.531, 17796.439, 6028.701,
39271.698, 3642.281)), row.names = c(NA, -24L), groups = structure(list(
grp = c("group1", "group2", "group3", "group4"), .rows = structure(list(
1:6, 7:12, 13:18, 19:24), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
我希望应用以下内容:
#Min / max from group1 and group2
data %>%
filter(grp == "group1" | grp == "group2") %>%
mutate(
normedOut = normaliseData(Y)
)
#Min / max from group1 and group3
data %>%
filter(grp == "group1" | grp == "group3") %>%
mutate(
normedOut = normaliseData(Y)
)
#Min / max from group1 and group4
data %>%
filter(grp == "group1" | grp == "group4") %>%
mutate(
normedOut = normaliseData(Y)
)
推荐答案
根据我对您的问题的理解,这里是 purrr
的一个选项.我们创建一个向量 groups
,其中包含我们感兴趣的三个对固定group1的对的循环的组.我们使用您所需的过滤器和突变序列,然后在包含规范化数据的 groups
向量中创建为每个组命名的列.这将导致一个数据帧包含3个新列,每个列代表组1和另一组之间的归一化Y.NA将填充没有配对的地方(例如,在group2和group3之间)
Here is one option with purrr
based on what I understand from your question. We create a vector, groups
, that contains the groups we are interested in looping over for our three pairs holding group1 fixed. We use your desired filter and mutate sequence and then create columns named for each group in our groups
vector that contains the normalized data. This will result in a dataframe that contains 3 new columns, each column representing the normalized Y between group 1 and another group. NAs will populate where there is no pair (e.g. between group2 and group3)
groups <- c("group2", "group3", "group4")
groups %>%
purrr::map_dfr(~ data %>%
filter(grp == "group1" | grp == .x) %>%
mutate(!!.x := normaliseData(Y)))
这篇关于将函数应用于组的组合,固定1个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!