问题描述
我有一个.csv文件,其中包含我的参与者的受众特征数据.数据被编码并从我的研究数据库(REDCap)中下载,每种种族都有其自己的单独列.也就是说,每个参与者在这些列的每一列中都有一个值(如果认可,则为1;如果未认可,则为0).
I have a .csv file with demographic data for my participants. The data are coded and downloaded from my study database (REDCap) in a way that each race has its own separate column. That is, each participant has a value in each of these columns (1 if endorsed, 0 if unendorsed).
它看起来像这样:
SubjID Sex Age White AA Asian Other
001 F 62 0 1 0 0
002 M 66 1 0 0 0
我必须使用环岛路来获取我的人口统计摘要统计信息.有一种更简单的方法可以做到这一点.我的问题是,如何将这些列合并为一个列,以便每个参与者的种族值只有一个?(即重新编码为1 =白色,2 = AA等,并且仅为每个参与者提取认可的类别并将其添加到此列中?)
I have to use a roundabout way to get my demographic summary stats. There's gotta be a simpler way to do this. My question is, how can I combine these columns into one column so that there is only one value for race for each participant? (i.e. recoding so 1 = white, 2 = AA, etc, and only the endorsed category is being pulled for each participant and added to this column?)
这就是我想要的外观:
SubjID Sex Age Race
001 F 62 2
002 M 66 1
推荐答案
这与我们使用REDCap的类似数据的方法大致相似.我们将 pivot_longer
用于伪变量.最终的 Race
变量也可以作为一个因素.请让我知道这是否是您的初衷.
This is more or less similar to our approach with similar data from REDCap. We use pivot_longer
for dummy variables. The final Race
variable could also be made a factor. Please let me know if this is what you had in mind.
在 pivot_longer
中添加了 names_ptypes
,以表明 Race
变量是一个因素(而不是>变异
).
Added names_ptypes
to pivot_longer
to indicate that Race
variable is a factor (instead of mutate
).
library(tidyverse)
df <- data.frame(
SubjID = c("001", "002"),
Sex = c("F", "M"),
Age = c(62, 66),
White = c(0, 1),
AA = c(1, 0),
Asian = c(0, 0),
Other = c(0, 0)
)
df %>%
pivot_longer(cols = c("White", "AA", "Asian", "Other"), names_to = "Race", names_ptypes = list(Race = factor()), values_to = "Value") %>%
filter(Value == 1) %>%
select(-Value)
结果:
# A tibble: 2 x 4
SubjID Sex Age Race
<fct> <fct> <dbl> <fct>
1 001 F 62 AA
2 002 M 66 White
这篇关于如何在R中合并多个数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!