本文介绍了使用Stringr的Sparklyr突变行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用sparklyr处理镶木地板文件.

I am trying to use sparklyr to process a parquet file.

表的结构:

type:str |类型:str |类型:str关键requestid |操作

type:str | type:str | type:strkey | requestid | operation

我正在运行代码:

txt %>%
     select(key, requestid, operation) %>%
     mutate(object = stringr::str_split(key, '/', simplify=TRUE) %>% dplyr::last() )

其中txt是有效的spark框架我得到:

where txt is a valid spark frameI get:

Error in stri_split_regex(string, pattern, n = n, simplify = simplify, : object 'key' not found
Traceback:

1. txt2 %>% select(key, requestid, operation) %>% mutate(object = stringr::str_split(key, 
 .     "/", simplify = TRUE) %>% dplyr::last())
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. mutate(., object = stringr::str_split(key, "/", simplify = TRUE) %>% 
 .     dplyr::last())
10. mutate.tbl_lazy(., object = stringr::str_split(key, "/", simplify = TRUE) %>% 
  .     dplyr::last())
11. partial_eval_dots(dots, vars = op_vars(.data))
12. lapply(dots, function(x) {
  .     new_quosure(partial_eval(get_expr(x), vars = vars, env = get_env(x)), 
  .         get_env(x))
  . })
13. FUN(X[[i]], ...)
14. new_quosure(partial_eval(get_expr(x), vars = vars, env = get_env(x)), 
  .     get_env(x))
15. partial_eval(get_expr(x), vars = vars, env = get_env(x))
16. partial_eval_call(call, vars, env)
17. lapply(call[-1], partial_eval, vars = vars, env = env)
18. FUN(X[[i]], ...)
19. partial_eval_call(call, vars, env)
20. eval_bare(call, env)
21. stringr::str_split(key, "/", simplify = TRUE)
22. stri_split_regex(string, pattern, n = n, simplify = simplify, 
  .     opts_regex = opts(pattern))

有什么想法吗?

推荐答案

此问题或多或少已得到解决.

This question has more or less been addressed here.

不幸的是,我不认为Stringer与Sparklyr直接兼容.但总的来说,您可以尝试通过几种方法解决该问题.

I don't think that stringr is directly compatible with sparklyr unfortunately. But in general what you are trying to do can be solved a few ways.

  1. 带有子字符串命令.您可以选择断点之前的字符串的一部分(在本例中为"),然后取一部分.例如
temp <- data.frame(
          var1 = c("a_b","a_b")
          ,var2 = c(1,2)
)
sdf_copy_to(con,temp,"temp", overwrite = TRUE)

a <- sdf_sql(con,"select * from temp")

b <- a %>% 
     dplyr::mutate(var1_part1 = sql("substr(var1,1,position('_',var1)-1)")
             ,va1_part2 = sql("substr(var1,position('_',var1)+1,length(var1))"))
  1. 使用数组,您可以使用'split'将列分成元素列表,然后将每个元素都分成一列.例如
a <- sdf_sql(con,"select * from temp")
b <- a %>%
      dplyr::mutate(var1_array = split(var1,'_')) %>%
      sdf_separate_column("var1_array", into = c("var1_part1", "var1_part2")) 

这篇关于使用Stringr的Sparklyr突变行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 04:14