本文介绍了R,dpylr:将数据帧中不同长度的列表列表转换为长格式数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为以下任务寻求解决方案。我有一个数据框,其中包含一个变量,该变量是具有属性dimnames的列表的列表。列表的长度不同。这是 str(df)的输出:

  Class'tbl_df' , tbl和 data.frame:3磅。的2个变量:
$步骤:int 1 2 3
$值:3
.. $的列表:num [1:2,1:2] 0.232 0.261 0.932 0.875
.. ..- attr(*,假名)== 2
..。。.. $的列表:chr 4 5
.. .. .. $:chr 0.2 0.094
.. $:num [1:2,1:5] 0.197 0.197 0.64 0.643 0.958 ...
...- attr(*,假名)= 2
.. ... $的列表:chr 4 5
.. .. .. $:chr 0.2 0.094 0.044 0.021 ...
.. $:num [1:2,1] 0.268 0.262
.. ..- attr(*,假名)= 2个
.. .. $的列表:chr 4 5
.. ... $:chr 0.2

我在下面包括了dput代码以重新创建此数据框。



我想要以下格式的数据框:

 步长值ab 
1 0.232 4 0.200
1 0.261 5 0.200
1 0.932 4 0.094
1 0.875 5 0.094
1不适用4 0.044
1不适用5 0.044
1不适用4 0.021
1不适用5 0.021
1不适用4 0.010
1不适用5 0.010
2 0.197 4 0.200
2 0.197 5 0.200
2 0.640 4 0.094
2 0.643 5 0.094
2 0.958 4 0.044
2 1.032 5 0.044
2 0.943 4 0.021
2 1.119 5 0.021
2 0.943 4 0.010
2 1.119 5 0.010
3 0.268 4 0.200
3 0.262 5 0.200
3 NA 4 0.094
3 NA 5 0.094
3 NA 4 0.044
3 NA 5 0.044
3 NA 4 0.021
3 NA 5 0.021
3 NA 4 0.010
3 NA 5 0.010

其中变量 a 是列表暗名列表的行名和 b 是列名。



我尝试了 for 循环来逐步分隔每个列表,但是



1)我没有成功l用 NA s填充列表( length(x)不起作用)。



2)我已经审查了,但未能成功将暗淡名称提取到矢量中用作数据帧列( attr(df $ Value,暗淡名称)产生 NULL 。)



一旦具有相同长度的列表,就可以构造新的数据帧矢量步骤依次进入 for 循环,然后rbind。还是有一种方法可以使用dimname属性直接使用行和列dimnames作为数据帧的列名来构造一个宽数据帧?然后,我可以聚集来制作一个长数据帧。



这里有几个子问题,我敢肯定有一个比我提出的解决方案更优雅的解决方案。感谢您的关注。



以下是创建数据框的dput代码:

  df<-structure(list(Step = c(1L,2L,3L),Value = list(structure(c(0.232,
0.261,0.932,0.875),.Dim = c(2L,
2L),.Dimnames = list(c( 4, 5),c( 0.2, 0.094
))),结构(c(0.197,0.197,0.640,
0.643、0.958、1.032、0.943,
1.119、0.943、1.119)、. Dim = c(2L,
5L)、. Dimnames = list(c( 4, 5 ),c( 0.2, 0.094,
0.044, 0.021, 0.01)))),结构(c(0.268,
0.262),.Dim = c(2L ,1L),.Dimnames = list(c( 4,
5), 0.2))))),class = c( tbl_df, tbl, data.frame) ,row.names = c(NA,
-3L),.Names = c( Step, Value))


解决方案

方法一:



首先,我们将矩阵获取到data.frames,然后我们将行名添加为称为 a 的单独列,并将其全部收集。通过取消嵌套,我们得到一个大data.frame。通过 complete



<$ p可以轻松添加 NA 值$ p> library(tidyverse)#使用dplyr,tidyr和purrr

df%>%
mutate(Value = map(Value,as.data.frame ),
值= map(值,rownames_to_column,'a'),
值= map(值,〜gather(。,b,值,-a)))%>%
unnest(Value)%&%;%
complete(Step,a,b)



方法二:



手动定义data.frame,然后执行相同操作:

  df%>%
mutate(值=映射(值,
〜data_frame(val = c(。),
a = rep(rownames(。),每个= ncol (。)),
b = rep(colnames(。),nrow(。))))))%>%
unnest(值)%>%
complete(步骤a ,b))



结果:



两者都给出:


I'm seeking a dplyr-ish solution to the following task. I have a dataframe that contains a variable that is a list of lists which has an attribute dimnames. The lists are of different lengths. Here's the output to str(df):

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   3 obs. of  2 variables:
 $ Step : int  1 2 3
 $ Value:List of 3
  ..$ : num [1:2, 1:2] 0.232 0.261 0.932 0.875
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr  "4" "5"
  .. .. ..$ : chr  "0.2" "0.094"
  ..$ : num [1:2, 1:5] 0.197 0.197 0.64 0.643 0.958 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr  "4" "5"
  .. .. ..$ : chr  "0.2" "0.094" "0.044" "0.021" ...
  ..$ : num [1:2, 1] 0.268 0.262
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr  "4" "5"
  .. .. ..$ : chr "0.2"

I've included dput code below to recreate this dataframe.

I want a dataframe in the following format:

Step    Value   a     b
 1      0.232   4   0.200
 1      0.261   5   0.200
 1      0.932   4   0.094
 1      0.875   5   0.094
 1       NA     4   0.044
 1       NA     5   0.044
 1       NA     4   0.021
 1       NA     5   0.021
 1       NA     4   0.010
 1       NA     5   0.010
 2      0.197   4   0.200
 2      0.197   5   0.200
 2      0.640   4   0.094
 2      0.643   5   0.094
 2      0.958   4   0.044
 2      1.032   5   0.044
 2      0.943   4   0.021
 2      1.119   5   0.021
 2      0.943   4   0.010
 2      1.119   5   0.010
 3      0.268   4   0.200
 3      0.262   5   0.200
 3       NA     4   0.094
 3       NA     5   0.094
 3       NA     4   0.044
 3       NA     5   0.044
 3       NA     4   0.021
 3       NA     5   0.021
 3       NA     4   0.010
 3       NA     5   0.010

where the variable a are the row names of the list of lists dimnames and b are the column names.

I've tried a for loop to separate out each list by step, but

1) I've not been successful in padding out the list with NAs (length(x) <- y doesn't work).

2) I've reviewed advanced R data types but haven't been successful in extracting the dimnames into vectors to use as dataframe columns (attr(df$Value, "dimnames") yields NULL.)

Once I have lists of the same length I can construct the new dataframe vectors step by step in the for loop and then rbind. Or is there a way to use the dimname attribute to directly construct a wide dataframe using both row and column dimnames as dataframe column names? I can then gather to make a long dataframe.

There's several subquestions here, and I'm sure there's a more elegant solution than the one I've mapped out. Thanks for looking.

Here's the dput code to create the dataframe:

df <- structure(list(Step = c(1L, 2L, 3L), Value = list(structure(c(0.232,
0.261, 0.932, 0.875), .Dim = c(2L,
2L), .Dimnames = list(c("4", "5"), c("0.2", "0.094"
))), structure(c(0.197, 0.197, 0.640,
0.643, 0.958, 1.032, 0.943,
1.119, 0.943, 1.119), .Dim = c(2L,
5L), .Dimnames = list(c("4", "5"), c("0.2", "0.094",
"0.044", "0.021", "0.01"))), structure(c(0.268,
0.262), .Dim = c(2L, 1L), .Dimnames = list(c("4",
"5"), "0.2")))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L), .Names = c("Step", "Value"))
解决方案

Approach one:

First, we get the matrices to data.frames, then we add the rownames as a separate column called a, and gather them all. By unnesting we get one big data.frame. Adding in the NA values is easy with complete

library(tidyverse) # using dplyr, tidyr and purrr

df %>%
  mutate(Value = map(Value, as.data.frame),
         Value = map(Value, rownames_to_column, 'a'),
         Value = map(Value, ~gather(., b, value, -a))) %>%
  unnest(Value) %>%
  complete(Step, a, b)

Approach two:

Manually define the data.frame, then do the same:

df %>%
  mutate(Value = map(Value,
                     ~data_frame(val = c(.),
                                 a = rep(rownames(.), each = ncol(.)),
                                 b = rep(colnames(.), nrow(.))))) %>%
  unnest(Value) %>%
  complete(Step, a, b))

Result:

Both give:

这篇关于R,dpylr:将数据帧中不同长度的列表列表转换为长格式数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-06 05:52