本文介绍了Databricks Spark SQL中的字符串比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 在substring(promo_name,instr(promo_name, P0&),2)= P0&时选择不同的
promo_name
,大小写
;然后当substring(promo_name,instr(promo_name, P1),2)= P1时为0
;然后1
当substring(promo_name,instr(promo_name, P01),3)= P01时,然后当substring(promo_name,instr(promo_name, P2),2)= P2时为1
;然后2
当substring(promo_name,instr(promo_name, P02),3)= P02时,然后当substring(promo_name,instr(promo_name, P3),2)= P3时为2
;然后,当substring(promo_name,instr(promo_name, P03),3)= P03时,则为3
;然后当substring(promo_name,instr(promo_name, P4),2)= P4时为3
;然后,当substring(promo_name,instr(promo_name, P04),3)= P04时,则为4
;然后,当substring(promo_name,instr(promo_name, P5),2)= P5时,则为4
;然后当substring(promo_name,instr(promo_name, P05),3)= P05时为5
;然后当substring(promo_name,instr(promo_name, P6),2)= P6时为5
;然后,当substring(promo_name,instr(promo_name, P06),3)= P06时,则为6
;然后当substring(promo_name,instr(promo_name, P7),2)= P7时,则为6
;然后,当substring(promo_name,instr(promo_name, P07),3)= P07时,则为7
;然后,当trim(substring(promo_name,instr(promo_name, P8),2))= P8时,则为7
。然后,当trim(substring(promo_name,instr(promo_name, P08''),3))= P08时,8
然后,当trim(substring(promo_name,instr(promo_name, P9),2))= P9时,则为8
。然后,当trim(substring(promo_name,instr(promo_name, P09),3))= P09时,则为9
。然后,当trim(substring(promo_name,instr(promo_name, P10),3))= P10时,则为9
。然后,当trim(substring(promo_name,instr(promo_name, P11),3))= P11时,则为10
。然后,当trim(substring(promo_name,instr(promo_name, P12),3))= P12时,11
然后12

否则0结尾为promo_id
,在trim(substring(promo_name,instr (promo_name, P10),3))= P10。然后,当trim(substring(promo_name,instr(promo_name, P11),3))= P11时,则为10
然后,当trim(substring(promo_name,instr(promo_name, P12),3))= P12时,11
然后,当trim(substring(promo_name,instr(promo_name, P13),3))= P13时,则为12
然后,当trim(substring(promo_name,instr(promo_name, P14),3))= P14时,13
然后14
,否则0结尾为hbi_dns_protected.store_zones_stock_v7_1_4
的id
,其中promo_name不为空


尝试从String中提取ID,从P10到P14,当我在单独的列中使用时,当我在同一列中使用时,只选择1而不是11,而不是12等等...


我犯了一个错误这里?


NB正则表达式区分大小写。如果需要捕获小写或大写的P,则可以使用字符类,即 [pP]


对所使用的RegEx模式的完整说明:


  1. 正则表达式以空格字符和大写字母P开头。这实际上将匹配空格和大写字母P。如果要使匹配不区分大小写,则可以使用字符类,例如 [pP] ,它表示匹配括号中的任何字符(区分大小写)

  2. RegEx的下一个组件是(\\d +)。它由用于匹配数字的RegEx模式 \d + 符号组成,表示匹配一个或者更多'。方括号将其分成一个组,即第1组。 \d 有一个额外的斜杠,这是<$ c $的Spark SQL实现所需的转义字符。 c> regexp_extract 。

  3. regexp_extract 的最后一个参数的值为1,表示返回组函数中的1

我使用用于测试和使用RegEx表达式。


select distinct
 promo_name
  ,case
    when substring(promo_name,instr(promo_name, "P0"),2)  = "P0"   then 0
    when substring(promo_name,instr(promo_name, "P1"),2)  = "P1"   then 1
    When substring(promo_name,instr(promo_name, "P01"),3) = "P01"  then 1
    when substring(promo_name,instr(promo_name, "P2"),2)  = "P2"   then 2
    When substring(promo_name,instr(promo_name, "P02"),3) = "P02"  then 2
    when substring(promo_name,instr(promo_name, "P3"),2)  = "P3"   then 3
    when substring(promo_name,instr(promo_name, "P03"),3) = "P03"  then 3
    when substring(promo_name,instr(promo_name, "P4"),2)  = "P4"   then 4
    when substring(promo_name,instr(promo_name, "P04"),3) = "P04"  then 4
    when substring(promo_name,instr(promo_name, "P5"),2)  = "P5"   then 5
    when substring(promo_name,instr(promo_name, "P05"),3) = "P05"  then 5
    when substring(promo_name,instr(promo_name, "P6"),2)  = "P6"   then 6
    when substring(promo_name,instr(promo_name, "P06"),3) = "P06"  then 6
    when substring(promo_name,instr(promo_name, "P7"),2)  = "P7"   then 7
    when substring(promo_name,instr(promo_name, "P07"),3) = "P07"  then 7
    when trim(substring(promo_name,instr(promo_name, "P8"),2))  ="P8"  then 8
    when trim(substring(promo_name,instr(promo_name, "P08"),3)) ="P08" then 8
    when trim(substring(promo_name,instr(promo_name, "P9"),2))  ="P9"  then 9
    when trim(substring(promo_name,instr(promo_name, "P09"),3)) ="P09" then 9
    when trim(substring(promo_name,instr(promo_name, "P10"),3)) ="P10" then 10
    when trim(substring(promo_name,instr(promo_name, "P11"),3)) ="P11" then 11
    when trim(substring(promo_name,instr(promo_name, "P12"),3)) ="P12" then 12

else 0 end as promo_id,Casewhen trim(substring(promo_name,instr(promo_name, "P10"),3)) = "P10" then 10when trim(substring(promo_name,instr(promo_name, "P11"),3)) = "P11" then 11when trim(substring(promo_name,instr(promo_name, "P12"),3)) = "P12" then 12when trim(substring(promo_name,instr(promo_name, "P13"),3)) = "P13" then 13when trim(substring(promo_name,instr(promo_name, "P14"),3)) = "P14" then 14else 0 end as idfrom hbi_dns_protected.store_zones_stock_v7_1_4where promo_name is not null

Trying to extract ID from String, it works fine from P10 to P14 when I use in a separate column, when I do in the same column it just pick 1 instead of 11, 1 instead of 12 etc...

Am I making an mistake here?sample data

解决方案

Why not use regexp_extract to do a RegEx extraction from your string, rather than write code for each case, something like:

%sql
SELECT *,
  regexp_extract( promo_name, ' P(\\d+)', 1 ) AS promoNumber
FROM tmp

My results:

NB The regex is case-sensitive. If you have a requirement to capture either lower or upper case Ps, then you could use a character class ie [pP] instead.

A full explanation of the RegEx pattern used:

  1. the regex starts off with a space character and a capital P. This will match literally a space and a capital P. If you want to make the match case-insensitive, you can use a character class eg [pP] which says match any character (case-sensitive) in the brackets
  2. The next component of the RegEx is (\\d+). This is made up of the RegEx pattern \d for matching digits, the + symbol which means, 'match one or more'. The brackets make it into a group, which is group 1. The \d has an additional slash which is an escape character required for the Spark SQL implementation of regexp_extract.
  3. the last argument of regexp_extract has the value of 1 which means 'return group 1 from the function'

I use regex101.com for testing and practicing with RegEx expressions.

这篇关于Databricks Spark SQL中的字符串比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 19:14