问题描述
我想创建一个新的数据框,其中各列是同一变量的子集,这些子集由不同的变量拆分。例如,我想制作一个新的变量子集('b'),其中各列由不同变量的子集('year')分隔
I'd like to create a new data frame where the columns are subsets of the same variable that are split by a different variable. For example, I'd like to make a new subset of variable ('b') where the columns are split by a subset of a different variable ('year')
set.seed(88)
df <- data.frame(year = rep(1996:1998,3), a = runif(9), b = runif(9), e = runif(9))
df
year a b e
1 1996 0.41050128 0.97679183 0.7477684
2 1997 0.10273570 0.54925568 0.7627982
3 1998 0.74104481 0.74416429 0.2114261
4 1996 0.48007870 0.55296210 0.7377032
5 1997 0.99051343 0.18097104 0.8404930
6 1998 0.99954223 0.02063662 0.9153588
7 1996 0.03247379 0.33055434 0.9182541
8 1997 0.76020784 0.10246882 0.7055694
9 1998 0.67713100 0.59292207 0.4093590
1996和1998年变量'b'的期望输出为:
Desired output for variable 'b' for years 1996 and 1998, is:
V1 V2
1 0.9767918 0.74416429
2 0.5529621 0.02063662
3 0.3305543 0.59292207
我可能找到一种循环的方法,但想知道是否有dplyr方法(或任何简单的方法来实现这一点)。
I could probably find a way to do this with a loop, but am wondering if there is a dplyr methed (or any simple method to accomplish this).
推荐答案
我们根据1996年,1998年的, select
的, b'列和 unstack
以获得预期的输出
We subset
dataset based on 1996, 1998 in 'year', select
the 'year', 'b' columns and unstack
to get the expected output
unstack(subset(df, year %in% c(1996, 1998), select = c('year', 'b')), b ~ year)
# X1996 X1998
#1 0.9767918 0.74416429
#2 0.5529621 0.02063662
#@3 0.3305543 0.59292207
或者使用 tidyverse
,我们选择
感兴趣的列,过滤器
基于'year'列的行,按'year'创建一个序列列,将展开
设置为'wide'格式,然后将选择
删除不需要的列
Or using tidyverse
, we select
the columns of interest, filter
the rows based on the 'year' column, create a sequence column by 'year', spread
to 'wide' format and select
out the unwanted columns
library(tidyverse)
df %>%
select(year, b) %>%
filter(year %in% c(1996, 1998)) %>%
group_by(year = factor(year, levels = unique(year), labels = c('V1', 'V2'))) %>%
mutate(n = row_number()) %>%
spread(year, b) %>%
select(-n)
# A tibble: 3 x 2
# V1 V2
# <dbl> <dbl>
#1 0.977 0.744
#2 0.553 0.0206
#3 0.331 0.593
由于只有两个年份,我们也可以使用 summary
df %>%
summarise(V1 = list(b[year == 1996]), V2 = list(b[year == 1998])) %>%
unnest
这篇关于创建具有相同变量的多个子集的新数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!