Spark DataFrame `regexp_replace` 中的反向引用

另请注意，缺少 regexp_extract 的使用，因为它在不匹配时表现不佳:val res2 = df.withColumn("repExtract",regexp_extract('str,"^([A-z])+?(\\d+)$",2))res2.show这样您就不得不像我在上述答案.谢谢！解决方案您需要使用 $+numeric_ID 反向引用语法:.withColumn("repBackRef",regexp_replace('str,"(\\d+)$",";$1"))^^I was recently trying to answer a question, when I realised I didn't know how to use a back-reference in a regexp with Spark DataFrames.For instance, with sed, I could do> echo 'a1b22333' | sed "s/$[0-9][0-9]*$/;\1/"a;1b;22;333But with Spark DataFrames I can't:val df = List("a1","b22","333").toDF("str")df.show+---+|str|+---+| a1||b22||333|+---+val res = df .withColumn("repBackRef",regexp_replace('str,"(\\d+)$",";\\1"))res.show+---+-----------+|str|repBackRef|+---+----------+| a1| a;1||b22| b;1||333| ;1|+---+----------+Just to make it clear: I don't want the result in this particular case, I would like a solution that would be as generic as back reference in, for instance, sed.Note also that using regexp_extract is lacking since it behaves badly when no matching:val res2 = df .withColumn("repExtract",regexp_extract('str,"^([A-z])+?(\\d+)$",2))res2.showSo that you are forced to use one column per pattern to extract as I did in the said answer.Thanks! 解决方案 You need to use the $+numeric_ID backreference syntax:.withColumn("repBackRef",regexp_replace('str,"(\\d+)$",";$1")) ^^ 这篇关于Spark DataFrame `regexp_replace` 中的反向引用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！