本文介绍了tidyr 宽到长,两次重复测量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些不整洁的数据.它有两个嵌套的重复度量(Q1/Q2 嵌套在构造中).我想把它从宽格式移到长格式.

I have some data that is not tidy. It has two nested repeated measures (Q1/Q2 nested within Constructs). I'd like to move it from wide to long format.

##    id time Q1..Ask Q2..Ask Q1..Tell Q2..Tell Q1..Respond Q2..Respond
## 1   1  pre       1       1        1        1           0           0
## 2   2  pre       0       1        1        0           0           1
## 3   3  pre       0       0        1        0           0           0
## 4   4  pre       1       1        0        1           1           0
## 5   5  pre       0       0        0        0           0           0
## 6   1 post       0       0        1        1           0           1
## 7   2 post       0       0        1        1           0           0
## 8   3 post       0       0        0        1           0           0
## 9   4 post       1       0        1        1           0           0
## 10  5 post       0       1        0        1           1           1

这里的问题 1 和问题 2(Q1 和 Q2)是针对同一结构的两个不同问题.所以 Q1..Ask Q2..Ask 是针对 Ask 结构的问题 1 和问题 2 的分数.如何将 Q1/Q2 制作成一列(Question),并将列标题的后半部分制作成一个 Construct 列,并带有 Score 列使用 tidyr?

Here question 1 and question 2 (Q1 & Q2) are two different questions aimed at the same construct. So Q1..Ask Q2..Ask are scores for question 1 and 2 targeted at an Ask construct. How can I make the Q1/Q2 into a column (Question) and the latter part of the column headers into a Construct column, with a Score column using tidyr?

# MWE

if (!require("pacman")) install.packages("pacman")
pacman::p_load(dplyr, tidyr)

set.seed(10)
dat <- data_frame(
    id = c(1:5, 1:5),
    time = rep(c("pre", "post"), each = 5),
    Q1..Ask = sample(0:1, 10, TRUE),
    Q2..Ask = sample(0:1, 10, TRUE),
    Q1..Tell = sample(0:1, 10, TRUE),
    Q2..Tell = sample(0:1, 10, TRUE),
    Q1..Respond = sample(0:1, 10, TRUE),
    Q2..Respond = sample(0:1, 10, TRUE)
)

# 代码使其长格式而不是 tidyr

Map(function(x, y) {

    data_frame(
        ID = rep(dat[["id"]], 2),
        Time = rep(dat[["time"]], 2),
        Question = rep(c("Q1", "Q2"), each=10),
        Construct = rep(gsub("Q[12]\\.+", "", colnames(dat)[x]), 20),
        Score = c(dat[[x]], dat[[y]])
    ) 

}, c(3, 5, 7), c(4, 6, 8)) %>%
    rbind_all 

# 期望输出

##    ID Time Question Construct Score
## 1   1  pre       Q1       Ask     1
## 2   2  pre       Q1       Ask     0
## 3   3  pre       Q1       Ask     0
## 4   4  pre       Q1       Ask     1
## 5   5  pre       Q1       Ask     0
## 6   1 post       Q1       Ask     0
## 7   2 post       Q1       Ask     0
## 8   3 post       Q1       Ask     0
## 9   4 post       Q1       Ask     1
## 10  5 post       Q1       Ask     0
## 11  1  pre       Q2       Ask     1
## 12  2  pre       Q2       Ask     1
## 13  3  pre       Q2       Ask     0
## 14  4  pre       Q2       Ask     1
## 15  5  pre       Q2       Ask     0
## 16  1 post       Q2       Ask     0
## 17  2 post       Q2       Ask     0
## 18  3 post       Q2       Ask     0
## 19  4 post       Q2       Ask     0
## 20  5 post       Q2       Ask     1
## 21  1  pre       Q1      Tell     1
## 22  2  pre       Q1      Tell     1
## 23  3  pre       Q1      Tell     1
## 24  4  pre       Q1      Tell     0
## 25  5  pre       Q1      Tell     0
## 26  1 post       Q1      Tell     1
## 27  2 post       Q1      Tell     1
## 28  3 post       Q1      Tell     0
## 29  4 post       Q1      Tell     1
## 30  5 post       Q1      Tell     0
## 31  1  pre       Q2      Tell     1
## 32  2  pre       Q2      Tell     0
## 33  3  pre       Q2      Tell     0
## 34  4  pre       Q2      Tell     1
## 35  5  pre       Q2      Tell     0
## 36  1 post       Q2      Tell     1
## 37  2 post       Q2      Tell     1
## 38  3 post       Q2      Tell     1
## 39  4 post       Q2      Tell     1
## 40  5 post       Q2      Tell     1
## 41  1  pre       Q1   Respond     0
## 42  2  pre       Q1   Respond     0
## 43  3  pre       Q1   Respond     0
## 44  4  pre       Q1   Respond     1
## 45  5  pre       Q1   Respond     0
## 46  1 post       Q1   Respond     0
## 47  2 post       Q1   Respond     0
## 48  3 post       Q1   Respond     0
## 49  4 post       Q1   Respond     0
## 50  5 post       Q1   Respond     1
## 51  1  pre       Q2   Respond     0
## 52  2  pre       Q2   Respond     1
## 53  3  pre       Q2   Respond     0
## 54  4  pre       Q2   Respond     0
## 55  5  pre       Q2   Respond     0
## 56  1 post       Q2   Respond     1
## 57  2 post       Q2   Respond     0
## 58  3 post       Q2   Respond     0
## 59  4 post       Q2   Respond     0
## 60  5 post       Q2   Respond     1

推荐答案

尝试

library(tidyr)
 gather(dat, Var, Score, -id, -time) %>% 
             extract(Var, c('Question', 'Construct'), 
                     '([^.]+)..([^.]+)') 

这篇关于tidyr 宽到长,两次重复测量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 07:45