问题描述
我目前拥有与此类似的广泛数据:
I currently have wide data which looks similar to this:
cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765
等
我希望进入一个类似于此的长数据框:
And I wish into a long data frame which looks similar to this:
cid dyad f op ed junk id
1 2 0 2 5 0.876 1
1 2 0 4 7 0.876 2
1 5 0 2 4 0.765 1
1 5 1 4 3 0.765 2
我尝试过使用 gather() 函数以及 reshape() 函数,但无法弄清楚如何创建多列而不是将所有列折叠成长样式
I have tried using the gather() function as well as the reshape() function but cannot figure out how to create multiple columns instead of collapsing all of the columns into a long style
感谢所有帮助
推荐答案
您可以使用基本的 reshape()
函数来(大致)同时融合多组变量,通过使用 variing
参数并将 direction
设置为 "long"
.
You can use the base reshape()
function to (roughly) simultaneously melt over multiple sets of variables, by using the varying
parameter and setting direction
to "long"
.
例如,您在此处为 variing
参数提供了三个变量名称集合"(向量)的列表:
For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying
argument:
dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765
", header=TRUE)
reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"))
你会得到这样的结果:
cid dyad junk time f op ed id
1.1 1 2 0.876 1 0 2 5 1
2.1 1 5 0.765 1 0 2 4 2
1.2 1 2 0.876 2 0 4 7 1
2.2 1 5 0.765 2 1 4 3 2
请注意,除了三个集合被折叠之外,还创建了两个变量:一个 $id
变量——它跟踪原始表中的行号 (dat
) 和一个 $time
变量——它对应于折叠的原始变量的顺序.现在还有嵌套的行号——1.1, 2.1, 1.2, 2.2
,这里只是$id
和$time
的值分别在那一行.
Notice that two variables get created, in addition to the three sets getting collapsed: an $id
variable -- which tracks the row number in the original table (dat
), and a $time
variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2
, which here are just the values of $id
and $time
at that row, respectively.
在不确切知道您要跟踪的内容的情况下,很难说 $id
或 $time
是否是您想要用作行标识符的内容,但它们都在.
Without knowing exactly what you're trying to track, hard to say whether $id
or $time
is what you want to use as the row identifier, but they're both there.
使用参数 timevar
和 idvar
可能也很有用(您可以将 timevar
设置为 NULL
,例如).
Might also be useful to play with the parameters timevar
and idvar
(you can set timevar
to NULL
, for example).
reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"),
timevar="id1", idvar="id2")
这篇关于将数据从宽转换为长(使用多列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!