这个问题在这里已经有了答案:





Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

(7 个回答)


3年前关闭。




我想通过以这种方式操作数据来整理数据集:

age gender  education       previous_comp_exp   tutorial_time   qID.1    time_taken.1   qID.2    time_taken.2
18  Male    Undergraduate   casual gamer        62.17926        sor9     39.61206       sor8     19.4892
24  Male    Undergraduate   casual gamer        85.01288        sor9     50.92343       sor8     16.15616

变成这样:
age gender  education       previous_comp_exp   tutorial_time   qID      time_taken
18  Male    Undergraduate   casual gamer        62.17926        sor9     39.61206
18  Male    Undergraduate   casual gamer        62.17926        sor8     19.4892
24  Male    Undergraduate   casual gamer        85.01288        sor9     50.92343
24  Male    Undergraduate   casual gamer        85.01288        sor8     16.15616

我已经尝试过 gather() ,但我只能让它与一列一起工作,并且我不断收到此警告:



有任何想法吗?

最佳答案

来自 meltdata.table(参见 ?patterns):

library(data.table)

melt(setDT(df), measure = patterns("^qID", "^time_taken"),
     value.name = c("qID", "time_taken"))

结果:
   age gender     education previous_comp_exp tutorial_time variable  qID time_taken
1:  18   Male Undergraduate      casual_gamer      62.17926        1 sor9   39.61206
2:  24   Male Undergraduate      casual_gamer      85.01288        1 sor9   50.92343
3:  18   Male Undergraduate      casual_gamer      62.17926        2 sor8   19.48920
4:  24   Male Undergraduate      casual_gamer      85.01288        2 sor8   16.15616

或使用 tidyr :
library(dplyr)
library(tidyr)

df %>%
  gather(variable, value, qID.1:time_taken.2) %>%
  mutate(variable = sub("\\.\\d$", "", variable)) %>%
  group_by(variable) %>%
  mutate(ID = row_number()) %>%
  spread(variable, value, convert = TRUE) %>%
  select(-ID)

结果:
# A tibble: 4 x 7
    age gender     education previous_comp_exp tutorial_time   qID time_taken
  <int> <fctr>        <fctr>            <fctr>         <dbl> <chr>      <dbl>
1    18   Male Undergraduate      casual_gamer      62.17926  sor9   39.61206
2    18   Male Undergraduate      casual_gamer      62.17926  sor8   19.48920
3    24   Male Undergraduate      casual_gamer      85.01288  sor9   50.92343
4    24   Male Undergraduate      casual_gamer      85.01288  sor8   16.15616

注:

对于 tidyr 方法, convert=TRUE 用于将 time_taken 转换回 numeric ,因为它在 10x23132313231351323153135 列时被强制转换为字符

数据:
df = structure(list(age = c(18L, 24L), gender = structure(c(1L, 1L
), .Label = "Male", class = "factor"), education = structure(c(1L,
1L), .Label = "Undergraduate", class = "factor"), previous_comp_exp = structure(c(1L,
1L), .Label = "casual_gamer", class = "factor"), tutorial_time = c(62.17926,
85.01288), qID.1 = structure(c(1L, 1L), .Label = "sor9", class = "factor"),
    time_taken.1 = c(39.61206, 50.92343), qID.2 = structure(c(1L,
    1L), .Label = "sor8", class = "factor"), time_taken.2 = c(19.4892,
    16.15616)), .Names = c("age", "gender", "education", "previous_comp_exp",
"tutorial_time", "qID.1", "time_taken.1", "qID.2", "time_taken.2"
), class = "data.frame", row.names = c(NA, -2L))

关于r - 通过收集多列来整理数据集?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47272371/

10-17 01:04
查看更多