本文介绍了聚集在Sparklyr中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Sparklyr处理一些数据.给定一个

I am using sparklyr to manipulate some data.Given a,

a<-tibble(id = rep(c(1,10), each = 10),
          attribute1 = rep(c("This", "That", 'These', 'Those', "The", "Other", "Test", "End", "Start", 'Beginning'), 2),
          value = rep(seq(10,100, by = 10),2),
          average = rep(c(50,100),each = 10),
          upper_bound = rep(c(80, 130), each =10),
          lower_bound = rep(c(20, 70), each =10))

我想使用收集"来操纵数据,如下所示:

I would like use "gather" to manipulate the data, like this:

b<- a %>% 
     gather(key = type_data, value = value_data, -c(id:attribute1))

但是,sparklyr上不提供"gather".我见过有人使用sdf_pivot模仿聚集"(例如),但在这种情况下我看不到如何使用它.

However, "gather" is not available on sparklyr. I have seen some people using sdf_pivot to mimic "gather" (eg How to use sdf_pivot() in sparklyr and concatenate strings?) but I can’t see how to use it in this case.

有人有主意吗?

干杯!

推荐答案

这是一个在sparklyr中模仿gather的函数.这样可以收集给定的列,同时保持其他所有内容不变,但是可以根据需要轻松扩展它.

Here's a function to mimic gather in sparklyr. This would gather the given columns while keeping everything else intact, but it can easily be extended if required.

# Function
sdf_gather <- function(tbl, gather_cols){

  other_cols <- colnames(tbl)[!colnames(tbl) %in% gather_cols]

  lapply(gather_cols, function(col_nm){
    tbl %>% 
      select(c(other_cols, col_nm)) %>% 
      mutate(key = col_nm) %>%
      rename(value = col_nm)  
  }) %>% 
    sdf_bind_rows() %>% 
    select(c(other_cols, 'key', 'value'))
}

# Example
spark_df %>% 
  select(col_1, col_2, col_3, col_4) %>% 
  sdf_gather(c('col_3', 'col_4'))

这篇关于聚集在Sparklyr中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-18 21:07