

根据对问题的最强烈的回答:How to make a great R reproducible example ?,我使用dput(query1)的输出共享我的数据集,通过复制/粘贴R控制台中的以下代码块,您可以立即在R中使用某些内容:

       structure(list(plu = structure(list(year = structure(list(id = 1:3,
    station = 100:102, pluMean = c(0.509068994778059, 1.92866478959912,
    1.09517453602154), pluMax = c(0.0146962179957886, 0.802984389130343,
    2.48170762478472)), .Names = c("id", "station", "pluMean",
"pluMax"), row.names = c(NA, -3L), class = "data.frame"), month = structure(list(
    id = 1:3, station = 100:102, pluMean = c(0.66493845927034,
    -1.3559338786041, 0.195600637750077), pluMax = c(0.503424623872161,
    0.234402501255681, -0.440264545434053)), .Names = c("id",
"station", "pluMean", "pluMax"), row.names = c(NA, -3L), class = "data.frame"),
    week = structure(list(id = 1:3, station = 100:102, pluMean = c(-0.608295829330578,
    -1.10256919591373, 1.74984007126193), pluMax = c(0.969668266601551,
    0.924426323739882, 3.47460867665884)), .Names = c("id", "station",
    "pluMean", "pluMax"), row.names = c(NA, -3L), class = "data.frame")), .Names = c("year",
"month", "week")), tsa = structure(list(year = structure(list(
    id = 1:3, station = 100:102, tsaMean = c(-1.49060721773042,
    -0.684735418997484, 0.0586655881113975), tsaMax = c(0.25739838787582,
    0.957634817758648, 1.37198023881125)), .Names = c("id", "station",
"tsaMean", "tsaMax"), row.names = c(NA, -3L), class = "data.frame"),
    month = structure(list(id = 1:3, station = 100:102, tsaMean = c(-0.684668662999479,
    -1.28087846387974, -0.600175481941456), tsaMax = c(0.962916941685075,
    0.530773351897188, -0.217143593955998)), .Names = c("id",
    "station", "tsaMean", "tsaMax"), row.names = c(NA, -3L), class = "data.frame"),
    week = structure(list(id = 1:3, station = 100:102, tsaMean = c(0.376481732842365,
    0.370435880636005, -0.105354927593471), tsaMax = c(1.93833635147645,
    0.81176751708868, 0.744932493064975)), .Names = c("id", "station",
    "tsaMean", "tsaMax"), row.names = c(NA, -3L), class = "data.frame")), .Names = c("year",
"month", "week"))), .Names = c("plu", "tsa"))

    > str(query1)
List of 2
 $ plu:List of 3
  ..$ year :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ pluMean: num [1:3] 0.509 1.929 1.095
  .. ..$ pluMax : num [1:3] 0.0147 0.803 2.4817
  ..$ month:'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ pluMean: num [1:3] 0.665 -1.356 0.196
  .. ..$ pluMax : num [1:3] 0.503 0.234 -0.44
  ..$ week :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ pluMean: num [1:3] -0.608 -1.103 1.75
  .. ..$ pluMax : num [1:3] 0.97 0.924 3.475
 $ tsa:List of 3
  ..$ year :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ tsaMean: num [1:3] -1.4906 -0.6847 0.0587
  .. ..$ tsaMax : num [1:3] 0.257 0.958 1.372
  ..$ month:'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ tsaMean: num [1:3] -0.685 -1.281 -0.6
  .. ..$ tsaMax : num [1:3] 0.963 0.531 -0.217
  ..$ week :'data.frame':   3 obs. of  4 variables:
  .. ..$ id     : int [1:3] 1 2 3
  .. ..$ station: int [1:3] 100 101 102
  .. ..$ tsaMean: num [1:3] 0.376 0.37 -0.105
  .. ..$ tsaMax : num [1:3] 1.938 0.812 0.745



我想通过full_joinid在所有timeInterval数据帧中以相同的名称(stationyearmonth)以编程方式 week。这意味着我应该以一个新列表(query1Changed)结束,该列表包含3个数据帧(yearmonthweek),每个数据帧包含5列(idstationpluMeanpluMaxtsaMeantsaMax)和3个观察值。在示意图上,我需要按以下方式排列数据:

  • 带有df query1$plu$year的df query1$tsa$year
  • 带有df query1$plu$month的df query1$tsa$month
  • 带有df query1$plu$week的df query1$tsa$week

  • 或用另一种表示形式表达:
  • 带有df query1[[1]][[1]]的df query1[[2]][[1]]
  • 带有df query1[[1]][[2]]的df query1[[2]][[2]]
  • 带有df query1[[1]][[3]]的df query1[[2]][[3]]

  • 并以编程方式表示(n是大列表中元素的总数):
  • 带有df query1[[i]][[1]]的df query1[[i+1]][[1]] ...带有df query1[[n]][[1]]
  • 带有df query1[[i]][[2]]的df query1[[i+1]][[2]] ...带有df query1[[n]][[2]]
  • 带有df query1[[i]][[3]]的df query1[[i+1]][[3]] ...带有df query1[[n]][[3]]

  • 我需要以编程方式实现这一目标,因为在我的真实项目中,我可能会遇到另一个大列表,每个timeIntervals数据帧中的参数元素超过2个,变量列超过4个。



    > query1Changed <- do.call(function(...) mapply(bind_cols, ..., SIMPLIFY=F), args = query1)

    > str(query1Changed)
    List of 3
     $ year :'data.frame':  3 obs. of  8 variables:
      ..$ id      : int [1:3] 1 2 3
      ..$ station : int [1:3] 100 101 102
      ..$ pluMean : num [1:3] 0.509 1.929 1.095
      ..$ pluMax  : num [1:3] 0.0147 0.803 2.4817
      ..$ id1     : int [1:3] 1 2 3
      ..$ station1: int [1:3] 100 101 102
      ..$ tsaMean : num [1:3] -1.4906 -0.6847 0.0587
      ..$ tsaMax  : num [1:3] 0.257 0.958 1.372
     $ month:'data.frame':  3 obs. of  8 variables:
      ..$ id      : int [1:3] 1 2 3
      ..$ station : int [1:3] 100 101 102
      ..$ pluMean : num [1:3] 0.665 -1.356 0.196
      ..$ pluMax  : num [1:3] 0.503 0.234 -0.44
      ..$ id1     : int [1:3] 1 2 3
      ..$ station1: int [1:3] 100 101 102
      ..$ tsaMean : num [1:3] -0.685 -1.281 -0.6
      ..$ tsaMax  : num [1:3] 0.963 0.531 -0.217
     $ week :'data.frame':  3 obs. of  8 variables:
      ..$ id      : int [1:3] 1 2 3
      ..$ station : int [1:3] 100 101 102
      ..$ pluMean : num [1:3] -0.608 -1.103 1.75
      ..$ pluMax  : num [1:3] 0.97 0.924 3.475
      ..$ id1     : int [1:3] 1 2 3
      ..$ station1: int [1:3] 100 101 102
      ..$ tsaMean : num [1:3] 0.376 0.37 -0.105
      ..$ tsaMax  : num [1:3] 1.938 0.812 0.745


    接下来,我尝试使用dplyr full_join进行相同操作,但没有成功。执行以下代码:
    > query1Changed <- do.call(function(...) mapply(full_join(..., by = c("station", "id")), ..., SIMPLIFY=F), args = query1)

    Error in UseMethod("full_join") :
      no applicable method for 'full_join' applied to an object of class "list"





    -Merging a data frame from a list of data frames [duplicate]
    -Simultaneously merge multiple data.frames in a list
    -Joining list of data.frames from map() call
    -Combining elements of list of lists by index

    -Joining a List of Data Frames with purrr::reduce()





    > join_each = function(x, y) map2(x, y, full_join)
    > join_each(query1$plu, query1$tsa)
    Joining, by = c("id", "station")
    Joining, by = c("id", "station")
    Joining, by = c("id", "station")
      id station  pluMean     pluMax     tsaMean    tsaMax
    1  1     100 0.509069 0.01469622 -1.49060722 0.2573984
    2  2     101 1.928665 0.80298439 -0.68473542 0.9576348
    3  3     102 1.095175 2.48170762  0.05866559 1.3719802
      id station    pluMean     pluMax    tsaMean     tsaMax
    1  1     100  0.6649385  0.5034246 -0.6846687  0.9629169
    2  2     101 -1.3559339  0.2344025 -1.2808785  0.5307734
    3  3     102  0.1956006 -0.4402645 -0.6001755 -0.2171436
      id station    pluMean    pluMax    tsaMean    tsaMax
    1  1     100 -0.6082958 0.9696683  0.3764817 1.9383364
    2  2     101 -1.1025692 0.9244263  0.3704359 0.8117675
    3  3     102  1.7498401 3.4746087 -0.1053549 0.7449325

    > reduce(query1, join_each)
    Joining, by = c("id", "station")
    Joining, by = c("id", "station")
    Joining, by = c("id", "station")
      id station  pluMean     pluMax     tsaMean    tsaMax
    1  1     100 0.509069 0.01469622 -1.49060722 0.2573984
    2  2     101 1.928665 0.80298439 -0.68473542 0.9576348
    3  3     102 1.095175 2.48170762  0.05866559 1.3719802
      id station    pluMean     pluMax    tsaMean     tsaMax
    1  1     100  0.6649385  0.5034246 -0.6846687  0.9629169
    2  2     101 -1.3559339  0.2344025 -1.2808785  0.5307734
    3  3     102  0.1956006 -0.4402645 -0.6001755 -0.2171436
      id station    pluMean    pluMax    tsaMean    tsaMax
    1  1     100 -0.6082958 0.9696683  0.3764817 1.9383364
    2  2     101 -1.1025692 0.9244263  0.3704359 0.8117675
    3  3     102  1.7498401 3.4746087 -0.1053549 0.7449325

    它计算join_each(query1[[1]], query1[[2]]) %>% join_each(query1[[3]]) ... %>% join_each(query1[[n]])

    更新:以下单行代码执行的操作相同:reduce(query1, map2, full_join)。但是,它不那么可读。

    关于r - dplyr : how-to programmatically full_join dataframes contained in a list of lists?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45963678/

    10-14 14:23