创建具有相同变量的多个子集的新数据框 | 创建具有相同变量的多个子集的新数据框

本文介绍了创建具有相同变量的多个子集的新数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建一个新的数据框，其中各列是同一变量的子集，这些子集由不同的变量拆分。例如，我想制作一个新的变量子集（'b'），其中各列由不同变量的子集（'year'）分隔

I'd like to create a new data frame where the columns are subsets of the same variable that are split by a different variable. For example, I'd like to make a new subset of variable ('b') where the columns are split by a subset of a different variable ('year')

set.seed(88)
df <- data.frame(year = rep(1996:1998,3), a = runif(9), b = runif(9), e = runif(9))
df

  year          a          b         e
1 1996 0.41050128 0.97679183 0.7477684
2 1997 0.10273570 0.54925568 0.7627982
3 1998 0.74104481 0.74416429 0.2114261
4 1996 0.48007870 0.55296210 0.7377032
5 1997 0.99051343 0.18097104 0.8404930
6 1998 0.99954223 0.02063662 0.9153588
7 1996 0.03247379 0.33055434 0.9182541
8 1997 0.76020784 0.10246882 0.7055694
9 1998 0.67713100 0.59292207 0.4093590

1996和1998年变量'b'的期望输出为：

Desired output for variable 'b' for years 1996 and 1998, is:

         V1         V2
1 0.9767918 0.74416429
2 0.5529621 0.02063662
3 0.3305543 0.59292207

我可能找到一种循环的方法，但想知道是否有dplyr方法（或任何简单的方法来实现这一点）。

I could probably find a way to do this with a loop, but am wondering if there is a dplyr methed (or any simple method to accomplish this).

推荐答案

我们根据1996年，1998年的， select 的， b'列和 unstack 以获得预期的输出

We subset dataset based on 1996, 1998 in 'year', select the 'year', 'b' columns and unstack to get the expected output

unstack(subset(df, year %in% c(1996, 1998), select = c('year', 'b')), b ~ year)
#     X1996      X1998
#1 0.9767918 0.74416429
#2 0.5529621 0.02063662
#@3 0.3305543 0.59292207

或者使用 tidyverse ，我们选择感兴趣的列，过滤器基于'year'列的行，按'year'创建一个序列列，将展开设置为'wide'格式，然后将选择删除不需要的列

Or using tidyverse, we select the columns of interest, filter the rows based on the 'year' column, create a sequence column by 'year', spread to 'wide' format and select out the unwanted columns

library(tidyverse)
df %>%
   select(year, b) %>%
   filter(year %in% c(1996, 1998)) %>%
   group_by(year = factor(year, levels = unique(year), labels = c('V1', 'V2'))) %>%
   mutate(n = row_number()) %>%
   spread(year, b) %>%
   select(-n)
# A tibble: 3 x 2
#     V1     V2
#   <dbl>  <dbl>
#1 0.977 0.744
#2 0.553 0.0206
#3 0.331 0.593

由于只有两个年份，我们也可以使用 summary

df %>%
   summarise(V1 = list(b[year == 1996]), V2 = list(b[year == 1998])) %>%
   unnest

这篇关于创建具有相同变量的多个子集的新数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！