“打包”来自data.frame的因子列表

本文介绍了“打包”来自data.frame的因子列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很喜欢R /有选择轻松重新组织数据，并且已经搜索了一个解决方案，但是找不到我想要做的事情。 Reshape2的融合/演员似乎没有工作，我没有掌握好plyr足够好的因素在这里。

I'm new to R / having the option to easily re-organize data, and have hunted around for a solution but can't find exactly what I'd like to do. Reshape2's melt/cast doesn't quite seem to work and I haven't mastered plyr well enough to factor it in here.

基本上我有一个data.frame与下面列出了一个类别列，其中每个元素都是可变长度的类别列表（更紧凑，因为＃列更大，我实际上有多个category_lists，我想保持分开）：

Basically I have a data.frame with a structure outlined below, with a category column in which each element is a variable-length list of categories (more compact because the # columns is much larger, and I actually have multiple category_lists that I'd like to keep separate):

>mydf
       ID      category_list    xval    yval
1     ID1   cat1, cat2, cat3   xnum1   ynum1
2     ID2         cat2, cat3   xnum2   ynum2
3     ID3               cat1   xnum3   ynum3

我想做操作类别作为因素（和相关的值，即列3/4），所以我认为我最终需要这样的东西，其中ID和x / y /其他列值根据类别列表的长度重复：

I want to do manipulations with the categories as factors (and the values associated, i.e. columns 3/4), so I think I need something like this in the end, where IDs and x/y/other column values are duplicated according to the length of the category list:

       ID           category    xval    yval
1     ID1               cat1   xnum1   ynum1
2     ID1               cat2   xnum1   ynum1
3     ID1               cat3   xnum1   ynum1
4     ID2               cat2   xnum2   ynum2
5     ID2               cat3   xnum2   ynum2
6     ID3               cat3   xnum2   ynum2

如果在category_list上有另一个因子/ facet的解决方案，那将是一个更简单的解决方案，但我没有遇到支持这一点的方法，
eg以下引发错误

If there's another solution to factor/facet on the category_list, that would be a simpler solution but I haven't come across methods that support this,e.g. the following throws an error

>ggplot(mydf, aes(x=x, y=y)) + geom_point() + facet_grid(~cat_list)

谢谢！

推荐答案

答案取决于 category_list 的格式。如果实际上它是每行列表

The answer will depend on the format of category_list. If in fact it is a list for each row

像

mydf <- data.frame(ID = paste0('ID',1:3),
 category_list = I(list(c('cat1','cat2','cat3'),  c('cat2','cat3'), c('cat1'))),
 xval = 1:3, yval = 1:3)

或

library(data.table)
mydf <- as.data.frame(data.table(ID = paste0('ID',1:3),
 category_list = list(c('cat1','cat2','cat3'),  c('cat2','cat3'), c('cat1')),
 xval = 1:3, yval = 1:3) )

然后，您可以使用 plyr 和合并以创建长表单数据

Then you can use plyr and merge to create your long form data

 newdf <- merge(mydf, ddply(mydf, .(ID), summarize, cat_list = unlist(category_list)), by = 'ID')


   ID    category_list xval yval cat_list
1 ID1 cat1, cat2, cat3    1    1     cat1
2 ID1 cat1, cat2, cat3    1    1     cat2
3 ID1 cat1, cat2, cat3    1    1     cat3
4 ID2       cat2, cat3    2    2     cat2
5 ID2       cat2, cat3    2    2     cat3
6 ID3             cat1    3    3     cat1

或不需要 merge

 do.call(rbind,lapply(split(mydf, mydf$ID), transform, cat_list = unlist(category_list)))

                        这篇关于“打包”来自data.frame的因子列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！