本文介绍了如何在Spark中使用Regexp_replace的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我刚起步很新,并且想对数据框的列执行操作,以便用.
I am pretty new to spark and would like to perform an operation on a column of a dataframe so as to replace all the ,
in the column with .
假设有一个数据框x和第x4列
Assume there is a dataframe x and column x4
x4
1,3435
1,6566
-0,34435
我希望输出为
x4
1.3435
1.6566
-0.34435
我正在使用的代码是
import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)
但是我收到以下错误
import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)
在语法,逻辑或任何其他合适方式方面的任何帮助将不胜感激
Any help on the syntax, logic or any other suitable way would be much appreciated
推荐答案
这是一个可复制的示例,假设x4
是字符串列.
Here's a reproducible example, assuming x4
is a string column.
import org.apache.spark.sql.functions.regexp_replace
val df = spark.createDataFrame(Seq(
(1, "1,3435"),
(2, "1,6566"),
(3, "-0,34435"))).toDF("Id", "x4")
语法为regexp_replace(str, pattern, replacement)
,其翻译为:
df.withColumn("x4New", regexp_replace(df("x4"), "\\,", ".")).show
+---+--------+--------+
| Id| x4| x4New|
+---+--------+--------+
| 1| 1,3435| 1.3435|
| 2| 1,6566| 1.6566|
| 3|-0,34435|-0.34435|
+---+--------+--------+
这篇关于如何在Spark中使用Regexp_replace的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!