本文介绍了聚集在Sparklyr中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用Sparklyr处理一些数据.给定一个
I am using sparklyr to manipulate some data.Given a,
a<-tibble(id = rep(c(1,10), each = 10),
attribute1 = rep(c("This", "That", 'These', 'Those', "The", "Other", "Test", "End", "Start", 'Beginning'), 2),
value = rep(seq(10,100, by = 10),2),
average = rep(c(50,100),each = 10),
upper_bound = rep(c(80, 130), each =10),
lower_bound = rep(c(20, 70), each =10))
我想使用收集"来操纵数据,如下所示:
I would like use "gather" to manipulate the data, like this:
b<- a %>%
gather(key = type_data, value = value_data, -c(id:attribute1))
但是,sparklyr上不提供"gather".我见过有人使用sdf_pivot模仿聚集"(例如),但在这种情况下我看不到如何使用它.
However, "gather" is not available on sparklyr. I have seen some people using sdf_pivot to mimic "gather" (eg How to use sdf_pivot() in sparklyr and concatenate strings?) but I can’t see how to use it in this case.
有人有主意吗?
干杯!
推荐答案
这是一个在sparklyr中模仿gather
的函数.这样可以收集给定的列,同时保持其他所有内容不变,但是可以根据需要轻松扩展它.
Here's a function to mimic gather
in sparklyr. This would gather the given columns while keeping everything else intact, but it can easily be extended if required.
# Function
sdf_gather <- function(tbl, gather_cols){
other_cols <- colnames(tbl)[!colnames(tbl) %in% gather_cols]
lapply(gather_cols, function(col_nm){
tbl %>%
select(c(other_cols, col_nm)) %>%
mutate(key = col_nm) %>%
rename(value = col_nm)
}) %>%
sdf_bind_rows() %>%
select(c(other_cols, 'key', 'value'))
}
# Example
spark_df %>%
select(col_1, col_2, col_3, col_4) %>%
sdf_gather(c('col_3', 'col_4'))
这篇关于聚集在Sparklyr中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!