本文介绍了“打包”来自data.frame的因子列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很喜欢R /有选择轻松重新组织数据,并且已经搜索了一个解决方案,但是找不到我想要做的事情。 Reshape2的融合/演员似乎没有工作,我没有掌握好plyr足够好的因素在这里。

I'm new to R / having the option to easily re-organize data, and have hunted around for a solution but can't find exactly what I'd like to do. Reshape2's melt/cast doesn't quite seem to work and I haven't mastered plyr well enough to factor it in here.

基本上我有一个data.frame与下面列出了一个类别列,其中每个元素都是可变长度的类别列表(更紧凑,因为#列更大,我实际上有多个category_lists,我想保持分开):

Basically I have a data.frame with a structure outlined below, with a category column in which each element is a variable-length list of categories (more compact because the # columns is much larger, and I actually have multiple category_lists that I'd like to keep separate):

>mydf
       ID      category_list    xval    yval
1     ID1   cat1, cat2, cat3   xnum1   ynum1
2     ID2         cat2, cat3   xnum2   ynum2
3     ID3               cat1   xnum3   ynum3

我想做操作类别作为因素(和相关的值,即列3/4),所以我认为我最终需要这样的东西,其中ID和x / y /其他列值根据类别列表的长度重复:

I want to do manipulations with the categories as factors (and the values associated, i.e. columns 3/4), so I think I need something like this in the end, where IDs and x/y/other column values are duplicated according to the length of the category list:

       ID           category    xval    yval
1     ID1               cat1   xnum1   ynum1
2     ID1               cat2   xnum1   ynum1
3     ID1               cat3   xnum1   ynum1
4     ID2               cat2   xnum2   ynum2
5     ID2               cat3   xnum2   ynum2
6     ID3               cat3   xnum2   ynum2

如果在category_list上有另一个因子/ facet的解决方案,那将是一个更简单的解决方案,但我没有遇到支持这一点的方法,
eg以下引发错误

If there's another solution to factor/facet on the category_list, that would be a simpler solution but I haven't come across methods that support this,e.g. the following throws an error

>ggplot(mydf, aes(x=x, y=y)) + geom_point() + facet_grid(~cat_list)



谢谢!

推荐答案

答案取决于 category_list 的格式。如果实际上它是每行列表

The answer will depend on the format of category_list. If in fact it is a list for each row

mydf <- data.frame(ID = paste0('ID',1:3),
 category_list = I(list(c('cat1','cat2','cat3'),  c('cat2','cat3'), c('cat1'))),
 xval = 1:3, yval = 1:3)

library(data.table)
mydf <- as.data.frame(data.table(ID = paste0('ID',1:3),
 category_list = list(c('cat1','cat2','cat3'),  c('cat2','cat3'), c('cat1')),
 xval = 1:3, yval = 1:3) )

然后,您可以使用 plyr 合并以创建长表单数据

Then you can use plyr and merge to create your long form data

 newdf <- merge(mydf, ddply(mydf, .(ID), summarize, cat_list = unlist(category_list)), by = 'ID')


   ID    category_list xval yval cat_list
1 ID1 cat1, cat2, cat3    1    1     cat1
2 ID1 cat1, cat2, cat3    1    1     cat2
3 ID1 cat1, cat2, cat3    1    1     cat3
4 ID2       cat2, cat3    2    2     cat2
5 ID2       cat2, cat3    2    2     cat3
6 ID3             cat1    3    3     cat1

或不需要 merge

 do.call(rbind,lapply(split(mydf, mydf$ID), transform, cat_list = unlist(category_list)))

这篇关于“打包”来自data.frame的因子列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 10:38