问题描述
我很喜欢R /有选择轻松重新组织数据,并且已经搜索了一个解决方案,但是找不到我想要做的事情。 Reshape2的融合/演员似乎没有工作,我没有掌握好plyr足够好的因素在这里。
I'm new to R / having the option to easily re-organize data, and have hunted around for a solution but can't find exactly what I'd like to do. Reshape2's melt/cast doesn't quite seem to work and I haven't mastered plyr well enough to factor it in here.
基本上我有一个data.frame与下面列出了一个类别列,其中每个元素都是可变长度的类别列表(更紧凑,因为#列更大,我实际上有多个category_lists,我想保持分开):
Basically I have a data.frame with a structure outlined below, with a category column in which each element is a variable-length list of categories (more compact because the # columns is much larger, and I actually have multiple category_lists that I'd like to keep separate):
>mydf
ID category_list xval yval
1 ID1 cat1, cat2, cat3 xnum1 ynum1
2 ID2 cat2, cat3 xnum2 ynum2
3 ID3 cat1 xnum3 ynum3
我想做操作类别作为因素(和相关的值,即列3/4),所以我认为我最终需要这样的东西,其中ID和x / y /其他列值根据类别列表的长度重复:
I want to do manipulations with the categories as factors (and the values associated, i.e. columns 3/4), so I think I need something like this in the end, where IDs and x/y/other column values are duplicated according to the length of the category list:
ID category xval yval
1 ID1 cat1 xnum1 ynum1
2 ID1 cat2 xnum1 ynum1
3 ID1 cat3 xnum1 ynum1
4 ID2 cat2 xnum2 ynum2
5 ID2 cat3 xnum2 ynum2
6 ID3 cat3 xnum2 ynum2
如果在category_list上有另一个因子/ facet的解决方案,那将是一个更简单的解决方案,但我没有遇到支持这一点的方法,
eg以下引发错误
If there's another solution to factor/facet on the category_list, that would be a simpler solution but I haven't come across methods that support this,e.g. the following throws an error
>ggplot(mydf, aes(x=x, y=y)) + geom_point() + facet_grid(~cat_list)
谢谢!
推荐答案
答案取决于 category_list
的格式。如果实际上它是每行列表
The answer will depend on the format of category_list
. If in fact it is a list
for each row
像
mydf <- data.frame(ID = paste0('ID',1:3),
category_list = I(list(c('cat1','cat2','cat3'), c('cat2','cat3'), c('cat1'))),
xval = 1:3, yval = 1:3)
或
library(data.table)
mydf <- as.data.frame(data.table(ID = paste0('ID',1:3),
category_list = list(c('cat1','cat2','cat3'), c('cat2','cat3'), c('cat1')),
xval = 1:3, yval = 1:3) )
然后,您可以使用 plyr
和合并
以创建长表单数据
Then you can use plyr
and merge
to create your long form data
newdf <- merge(mydf, ddply(mydf, .(ID), summarize, cat_list = unlist(category_list)), by = 'ID')
ID category_list xval yval cat_list
1 ID1 cat1, cat2, cat3 1 1 cat1
2 ID1 cat1, cat2, cat3 1 1 cat2
3 ID1 cat1, cat2, cat3 1 1 cat3
4 ID2 cat2, cat3 2 2 cat2
5 ID2 cat2, cat3 2 2 cat3
6 ID3 cat1 3 3 cat1
或不需要 merge $ c的非plyr方法$ c>
do.call(rbind,lapply(split(mydf, mydf$ID), transform, cat_list = unlist(category_list)))
这篇关于“打包”来自data.frame的因子列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!