问题描述
我正在尝试使用dcast重塑数据。我正在处理每个样本有10-30个样本单位的样本。我无法汇总我的数据。
I'm trying to reshape my data using dcast. I'm working with samples where each sample has 10-30 sample units. I can't have my data aggregate.
我的数据采用以下格式:
My data is in this format:
ID total
sample_1 1
sample_1 0
sample_1 2
sample_1 1
sample_1 0
sample_1 0
sample_1 2
sample_1 1
sample_1 0
sample_1 2
sample_1 1
sample_1 4
sample_2 2
sample_2 1
sample_2 2
sample_2 0
sample_2 0
sample_2 0
sample_2 1
sample_2 2
sample_2 1
sample_2 4
sample_2 5
sample_2 2
sample_2 1
sample_3 0
sample_3 0
sample_3 1
sample_3 2
sample_3 1
sample_3 0
sample_3 2
sample_3 1
sample_3 4
sample_3 5
sample_3 1
sample_3 1
sample_3 0
sample_3 0
sample_3 1
我希望它看起来像这样:
And I want it to looks like it:
sample_1 sample_2 sample_3
1 2 0
0 1 0
2 2 1
1 0 2
0 0 1
0 0 0
2 1 2
1 2 1
0 1 4
2 4 5
1 5 1
4 2 1
1 0
0
1
我的样品ID变成不同的列。
Where my sample ID's turn into different columns.
我尝试了几种方法,但R不断对其进行汇总。
I tried in several ways but R keep aggregating it.
推荐答案
您可以使用进行此操作dcast()
,但是您必须为每个 ID
添加行号。
You can do this with dcast()
but you have to add row numbers for each ID
.
data.table
包是 reshape2
会实现 dcast()
。 data.table
具有方便的 rowid()
函数,可在每个组中生成唯一的行ID。除此之外,我们得到:
The data.table
package is another package besides reshape2
which implements dcast()
. data.table
has a handy rowid()
function to generate unique row ids within each group. WIth that, we get:
library(data.table)
dcast(setDT(DF), rowid(ID) ~ ID, value.var = "total")
# ID sample_1 sample_2 sample_3
# 1: 1 1 2 0
# 2: 2 0 1 0
# 3: 3 2 2 1
# 4: 4 1 0 2
# 5: 5 0 0 1
# 6: 6 0 0 0
# 7: 7 2 1 2
# 8: 8 1 2 1
# 9: 9 0 1 4
#10: 10 2 4 5
#11: 11 1 5 1
#12: 12 4 2 1
#13: 13 NA 1 0
#14: 14 NA NA 0
#15: 15 NA NA 1
但是,我建议以长格式继续任何数据处理并使用分组。这比处理单个列要容易得多。例如,
However, I recommend to continue any data processing in long format and use grouping. That's much easier than to work on individual columns. For instance,
# count observations by group
DF[, .N, by = ID]
# ID N
#1: sample_1 12
#2: sample_2 13
#3: sample_3 15
# compute mean by group
DF[, mean(total), by = ID]
# ID V1
#1: sample_1 1.166667
#2: sample_2 1.615385
#3: sample_3 1.266667
# get min and max by group
DF[, .(min = min(total), max = max(total)), by = ID]
# ID min max
#1: sample_1 0 4
#2: sample_2 0 5
#3: sample_3 0 5
# the same using range()
DF[, as.list(range(total)), by = ID]
# ID V1 V2
#1: sample_1 0 4
#2: sample_2 0 5
#3: sample_3 0 5
数据
Data
DF <- structure(list(ID = c("sample_1", "sample_1", "sample_1", "sample_1",
"sample_1", "sample_1", "sample_1", "sample_1", "sample_1", "sample_1",
"sample_1", "sample_1", "sample_2", "sample_2", "sample_2", "sample_2",
"sample_2", "sample_2", "sample_2", "sample_2", "sample_2", "sample_2",
"sample_2", "sample_2", "sample_2", "sample_3", "sample_3", "sample_3",
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3",
"sample_3", "sample_3", "sample_3", "sample_3", "sample_3", "sample_3"
), total = c(1L, 0L, 2L, 1L, 0L, 0L, 2L, 1L, 0L, 2L, 1L, 4L,
2L, 1L, 2L, 0L, 0L, 0L, 1L, 2L, 1L, 4L, 5L, 2L, 1L, 0L, 0L, 1L,
2L, 1L, 0L, 2L, 1L, 4L, 5L, 1L, 1L, 0L, 0L, 1L)), .Names = c("ID",
"total"), row.names = c(NA, -40L), class = "data.frame")
这篇关于如何使用dcast将一列拆分为不同的列而不进行汇总?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!