通过列选择将数据帧拆分为多个数据帧

通过列选择将数据帧拆分为多个数据帧

本文介绍了通过列选择将数据帧拆分为多个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这些是我的数据框:

# data
set.seed(1234321)

# Original data frame (i.e. a questionnaire survey data)
answer <- c("Yes", "No")
likert_scale <- c("strongly disagree", "disagree", "undecided", "agree", "strongly agree")
d1 <- c(rnorm(10)*10)
d2 <- sample(x = c(letters), size = 10, replace = TRUE)
d3 <- sample(x = likert_scale, size = 10, replace = TRUE)
d4 <- sample(x = likert_scale, size = 10, replace = TRUE)
d5 <- sample(x = likert_scale, size = 10, replace = TRUE)
d6 <- sample(x = answer, size = 10, replace = TRUE)
d7 <- sample(x = answer, size = 10, replace = TRUE)
original_df <- data.frame(d1, d2, d3, d4, d5, d6, d7)

# Questionnaire codebook data frame
quest_section <- c("generic", "likert scale", "specific approval")
starting_column <- c(1, 3, 6)
ending_column <- c(2, 5, 7)
df_codebook <- data.frame(quest_section, min_column, max_column)

我想根据 df_codebook 中的 quest_section 变量,使用 starting_column 将原始数据帧拆分为不同的数据帧>ending_column 作为 indeces 来选择 original_df 中的列.

I would like to split the orginal dataframe in different ones on the basis of quest_section variable in the df_codebook, using starting_column and ending_column as indeces to select columns in the original_df.

这是我尝试创建一个函数以拆分original_df:

This is what I tried creating a function in order to split the original_df:

# splitting dataframe function
split_df <- function(my_df, my_codebook) {
        df_names <- df_codebook[,1] %>%
                map(set_names)
        for (i in 1:length(df_codebook[,1])) {
                df_names$`[i]` <- original_df %>%
                        dplyr::select(df_codebook[[2]][i]:df_codebook[[3]][i])
        }
}

# apply function to two dataframes
my_df_list <- split_df(my_df = original_df, my_codebook = df_codebook)

结果是一个 NULL 对象而不是以下列表:

and the result was a NULL object instead of the following list:

> my_df_list
$generic
           d1 d2
1   12.369081  z
2   15.616230  x
3   18.396185  f
4    3.173245  q
5   10.715115  j
6  -11.459955  p
7    2.488894  j
8    1.158625  n
9   26.200816  a
10  12.624048  b

$`likert scale`
                  d3                d4                d5
1           disagree    strongly agree    strongly agree
2          undecided         undecided strongly disagree
3     strongly agree         undecided strongly disagree
4              agree         undecided         undecided
5  strongly disagree             agree         undecided
6           disagree strongly disagree         undecided
7           disagree             agree          disagree
8           disagree strongly disagree         undecided
9          undecided strongly disagree          disagree
10 strongly disagree          disagree    strongly agree

$`specific approval`
    d6  d7
1   No  No
2   No  No
3  Yes  No
4  Yes Yes
5  Yes Yes
6  Yes Yes
7  Yes  No
8   No Yes
9   No  No
10  No Yes

我对任何类型的解决方案都感兴趣:使用 tidyversepurrr 方法,或功能性方法.

I am interested in any kind of solution: using tidyverse and purrr approach, or functional one.

推荐答案

您可以使用 Map 在每个 starting_column 之间创建一个序列:ending_column 并使用该序列从 original_df 中提取相关列.我们可以使用 setNames 为列表分配名称.

You can use Map to create a sequence between each of starting_column: ending_column and use that sequence to extract the relevant columns from original_df. We can use setNames to assign names to the list.

setNames(Map(function(x, y) original_df[, x:y],
             df_codebook$starting_column, df_codebook$ending_column),
         df_codebook$quest_section)

返回

#$generic
#           d1 d2
#1   12.369081  z
#2   15.616230  x
#3   18.396185  f
#4    3.173245  q
#5   10.715115  j
#6  -11.459955  p
#7    2.488894  j
#8    1.158625  n
#9   26.200816  a
#10  12.624048  b

#$`likert scale`
#                  d3                d4                d5
#1           disagree    strongly agree    strongly agree
#2          undecided         undecided strongly disagree
#3     strongly agree         undecided strongly disagree
#4              agree         undecided         undecided
#5  strongly disagree             agree         undecided
#6           disagree strongly disagree         undecided
#7           disagree             agree          disagree
#8           disagree strongly disagree         undecided
#9          undecided strongly disagree          disagree
#10 strongly disagree          disagree    strongly agree

#$`specific approval`
#    d6  d7
#1   No  No
#2   No  No
#3  Yes  No
#4  Yes Yes
#5  Yes Yes
#6  Yes Yes
#7  Yes  No
#8   No Yes
#9   No  No
#10  No Yes

这篇关于通过列选择将数据帧拆分为多个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:45