给定以下一组数据:

 transect <- c("B","N","C","D","H","J","E","L","I","I")
 sampler <- c(rep("J",5),rep("W",5))
 species <- c("ROB","HAW","HAW","ROB","PIG","HAW","PIG","PIG","HAW","HAW")
 weight <- c(2.80,52.00,56.00,2.80,16.00,55.00,16.20,18.30,52.50,57.00)
 wingspan <- c(13.9, 52.0, 57.0, 13.7, 11.0,52.5, 10.7, 11.1, 52.3, 55.1)
 week <- c(1,2,3,4,5,6,7,8,9,9)
 # Warning to R newbs: Really bad idea to use this code
 ex <- as.data.frame(cbind(transect,sampler,species,weight,wingspan,week))

我想要实现的是转置物种及其有关重量和翼展的相关信息。为了更好地了解预期结果,请参见下文。我的数据集大约有 50 万行,包含大约 200 个不同的物种,因此它将是一个非常大的数据框。
      transect sampler week ROBweight HAWweight PIGweight ROBwingspan HAWwingspan PIGwingspan
1         B       J    1       2.8       0.0       0.0        13.9         0.0         0.0
2         N       J    2       0.0      52.0       0.0         0.0        52.0         0.0
3         C       J    3       0.0      56.0       0.0         0.0        57.0         0.0
4         D       J    4       2.8       0.0       0.0        13.7         0.0         0.0
5         H       J    5       0.0       0.0      16.0         0.0         0.0        11.0
6         J       W    6       0.0      55.0       0.0         0.0        52.5         0.0
7         E       W    7       0.0       0.0      16.2         0.0         0.0        10.7
8         L       W    8       0.0       0.0      18.3         0.0         0.0        11.1
9         I       W    9       0.0      52.5       0.0         0.0        52.3         0.0
10        I       W    9       0.0      57.0       0.0         0.0        55.1         0.0

最佳答案

主要问题是您目前没有唯一的“id”变量,这会给 reshapedcast 的常见嫌疑人带来问题。

这是一个解决方案。我已经使用了“splitstackshape”包中的 getanID,但是使用许多不同的方法创建自己的唯一 ID 变量非常容易。

library(splitstackshape)
library(reshape2)
idvars <- c("transect", "sampler", "week")
ex <- getanID(ex, id.vars=idvars)

从这里,您有两个选择:

来自基础 R 的 reshape:
reshape(ex, direction = "wide",
        idvar=c("transect", "sampler", "week", ".id"),
        timevar="species")

来自“reshape2”的 meltdcast
首先,melt 您的数据为“长”形式。
exL <- melt(ex, id.vars=c(idvars, ".id", "species"))

然后,cast 您的数据为宽格式。
dcast(exL, transect + sampler + week + .id ~ species + variable)
#    transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
# 1         B       J    1   1         NA           NA         NA           NA        2.8         13.9
# 2         C       J    3   1       56.0         57.0         NA           NA         NA           NA
# 3         D       J    4   1         NA           NA         NA           NA        2.8         13.7
# 4         E       W    7   1         NA           NA       16.2         10.7         NA           NA
# 5         H       J    5   1         NA           NA       16.0         11.0         NA           NA
# 6         I       W    9   1       52.5         52.3         NA           NA         NA           NA
# 7         I       W    9   2       57.0         55.1         NA           NA         NA           NA
# 8         J       W    6   1       55.0         52.5         NA           NA         NA           NA
# 9         L       W    8   1         NA           NA       18.3         11.1         NA           NA
# 10        N       J    2   1       52.0         52.0         NA           NA         NA           NA

更好的选择:“data.table”

或者(也许最好),您可以使用“data.table”包(至少版本 1.8.11),如下所示:
library(data.table)
library(reshape2) ## Also required here
packageVersion("data.table")
# [1] ‘1.8.11’
DT <- data.table(ex)
DT[, .id := sequence(.N), by = c("transect", "sampler", "week")]
DTL <- melt(DT, measure.vars=c("weight", "wingspan"))
dcast.data.table(DTL, transect + sampler + week + .id ~ species + variable)
#     transect sampler week .id HAW_weight HAW_wingspan PIG_weight PIG_wingspan ROB_weight ROB_wingspan
#  1:        B       J    1   1         NA           NA         NA           NA        2.8         13.9
#  2:        C       J    3   1       56.0         57.0         NA           NA         NA           NA
#  3:        D       J    4   1         NA           NA         NA           NA        2.8         13.7
#  4:        E       W    7   1         NA           NA       16.2         10.7         NA           NA
#  5:        H       J    5   1         NA           NA       16.0         11.0         NA           NA
#  6:        I       W    9   1       52.5         52.3         NA           NA         NA           NA
#  7:        I       W    9   2       57.0         55.1         NA           NA         NA           NA
#  8:        J       W    6   1       55.0         52.5         NA           NA         NA           NA
#  9:        L       W    8   1         NA           NA       18.3         11.1         NA           NA
# 10:        N       J    2   1       52.0         52.0         NA           NA         NA           NA

fill = 0 添加到任一 dcast 版本以将 NA 值替换为 0。

关于r - 在 R 中部分转置数据帧,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19384766/

10-12 17:33
查看更多