我想将数据框从长格式更改为宽格式,然后丢失一些我想保留的数据。
对于以下示例:

df <- data.frame(Par1 = unlist(strsplit("AABBCCC","")),
                 Par2 = unlist(strsplit("DDEEFFF","")),
                 ParD = unlist(strsplit("foo,bar,baz,qux,bla,xyz,meh",",")),
                 Type = unlist(strsplit("pre,post,pre,post,pre,post,post",",")),
                 Val = c(10,20,30,40,50,60,70))

   #     Par1 Par2 ParD Type Val
   #   1    A    D  foo  pre  10
   #   2    A    D  bar post  20
   #   3    B    E  baz  pre  30
   #   4    B    E  qux post  40
   #   5    C    F  bla  pre  50
   #   6    C    F  xyz post  60
   #   7    C    F  meh post  70

dfw <- dcast(df,
             formula = Par1 + Par2 ~ Type,
             value.var = "Val",
             fun.aggregate = mean)

 #     Par1 Par2 post pre
 #   1    A    D   20  10
 #   2    B    E   40  30
 #   3    C    F   65  50

这几乎是我所需要的,但我想拥有
  • 某些字段保留ParD字段中的数据(例如,作为单个合并字符串),
  • 用于聚合的观察数。

  • 即我希望生成的data.frame如下:
        #     Par1 Par2 post pre Num.pre Num.post ParD
        #   1    A    D   20  10      1      1    foo_bar
        #   2    B    E   40  30      1      1    baz_qux
        #   3    C    F   65  50      1      2    bla_xyz_meh
    

    如有任何想法,我将不胜感激。例如,我试图通过编写dcast来解决第二个任务:fun.aggregate=function(x) c(Val=mean(x),Num=length(x))-但这会导致错误。

    最佳答案

    使用ddply分两步解决(我不满意,但得到了结果)

    dat <- ddply(df,.(Par1,Par2),function(x){
      data.frame(ParD=paste(paste(x$ParD),collapse='_'),
                 Num.pre =length(x$Type[x$Type =='pre']),
                 Num.post = length(x$Type[x$Type =='post']))
    })
    
    merge(dfw,dat)
     Par1 Par2 post pre        ParD Num.pre Num.post
    1    A    D  2.0   1     foo_bar       1        1
    2    B    E  4.0   3     baz_qux       1        1
    3    C    F  6.5   5 bla_xyz_meh       1        2
    

    关于r - 复杂的重塑,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15182888/

    10-12 23:26