问题描述
在substring(promo_name,instr(promo_name, P0&),2)= P0&时选择不同的
promo_name
,大小写
;然后当substring(promo_name,instr(promo_name, P1),2)= P1时为0
;然后1
当substring(promo_name,instr(promo_name, P01),3)= P01时,然后当substring(promo_name,instr(promo_name, P2),2)= P2时为1
;然后2
当substring(promo_name,instr(promo_name, P02),3)= P02时,然后当substring(promo_name,instr(promo_name, P3),2)= P3时为2
;然后,当substring(promo_name,instr(promo_name, P03),3)= P03时,则为3
;然后当substring(promo_name,instr(promo_name, P4),2)= P4时为3
;然后,当substring(promo_name,instr(promo_name, P04),3)= P04时,则为4
;然后,当substring(promo_name,instr(promo_name, P5),2)= P5时,则为4
;然后当substring(promo_name,instr(promo_name, P05),3)= P05时为5
;然后当substring(promo_name,instr(promo_name, P6),2)= P6时为5
;然后,当substring(promo_name,instr(promo_name, P06),3)= P06时,则为6
;然后当substring(promo_name,instr(promo_name, P7),2)= P7时,则为6
;然后,当substring(promo_name,instr(promo_name, P07),3)= P07时,则为7
;然后,当trim(substring(promo_name,instr(promo_name, P8),2))= P8时,则为7
。然后,当trim(substring(promo_name,instr(promo_name, P08''),3))= P08时,8
然后,当trim(substring(promo_name,instr(promo_name, P9),2))= P9时,则为8
。然后,当trim(substring(promo_name,instr(promo_name, P09),3))= P09时,则为9
。然后,当trim(substring(promo_name,instr(promo_name, P10),3))= P10时,则为9
。然后,当trim(substring(promo_name,instr(promo_name, P11),3))= P11时,则为10
。然后,当trim(substring(promo_name,instr(promo_name, P12),3))= P12时,11
然后12
否则0结尾为promo_id
,在trim(substring(promo_name,instr (promo_name, P10),3))= P10。然后,当trim(substring(promo_name,instr(promo_name, P11),3))= P11时,则为10
然后,当trim(substring(promo_name,instr(promo_name, P12),3))= P12时,11
然后,当trim(substring(promo_name,instr(promo_name, P13),3))= P13时,则为12
然后,当trim(substring(promo_name,instr(promo_name, P14),3))= P14时,13
然后14
,否则0结尾为hbi_dns_protected.store_zones_stock_v7_1_4
的id
,其中promo_name不为空
尝试从String中提取ID,从P10到P14,当我在单独的列中使用时,当我在同一列中使用时,只选择1而不是11,而不是12等等...
我犯了一个错误这里?
NB正则表达式区分大小写。如果需要捕获小写或大写的P,则可以使用字符类,即 [pP]
。
对所使用的RegEx模式的完整说明:
- 正则表达式以空格字符和大写字母P开头。这实际上将匹配空格和大写字母P。如果要使匹配不区分大小写,则可以使用字符类,例如
[pP]
,它表示匹配括号中的任何字符(区分大小写) - RegEx的下一个组件是
(\\d +)
。它由用于匹配数字的RegEx模式\d
和+
符号组成,表示匹配一个或者更多'。方括号将其分成一个组,即第1组。\d
有一个额外的斜杠,这是<$ c $的Spark SQL实现所需的转义字符。 c> regexp_extract 。 -
regexp_extract
的最后一个参数的值为1,表示返回组函数中的1
我使用用于测试和使用RegEx表达式。
select distinct
promo_name
,case
when substring(promo_name,instr(promo_name, "P0"),2) = "P0" then 0
when substring(promo_name,instr(promo_name, "P1"),2) = "P1" then 1
When substring(promo_name,instr(promo_name, "P01"),3) = "P01" then 1
when substring(promo_name,instr(promo_name, "P2"),2) = "P2" then 2
When substring(promo_name,instr(promo_name, "P02"),3) = "P02" then 2
when substring(promo_name,instr(promo_name, "P3"),2) = "P3" then 3
when substring(promo_name,instr(promo_name, "P03"),3) = "P03" then 3
when substring(promo_name,instr(promo_name, "P4"),2) = "P4" then 4
when substring(promo_name,instr(promo_name, "P04"),3) = "P04" then 4
when substring(promo_name,instr(promo_name, "P5"),2) = "P5" then 5
when substring(promo_name,instr(promo_name, "P05"),3) = "P05" then 5
when substring(promo_name,instr(promo_name, "P6"),2) = "P6" then 6
when substring(promo_name,instr(promo_name, "P06"),3) = "P06" then 6
when substring(promo_name,instr(promo_name, "P7"),2) = "P7" then 7
when substring(promo_name,instr(promo_name, "P07"),3) = "P07" then 7
when trim(substring(promo_name,instr(promo_name, "P8"),2)) ="P8" then 8
when trim(substring(promo_name,instr(promo_name, "P08"),3)) ="P08" then 8
when trim(substring(promo_name,instr(promo_name, "P9"),2)) ="P9" then 9
when trim(substring(promo_name,instr(promo_name, "P09"),3)) ="P09" then 9
when trim(substring(promo_name,instr(promo_name, "P10"),3)) ="P10" then 10
when trim(substring(promo_name,instr(promo_name, "P11"),3)) ="P11" then 11
when trim(substring(promo_name,instr(promo_name, "P12"),3)) ="P12" then 12
else 0 end as promo_id,Casewhen trim(substring(promo_name,instr(promo_name, "P10"),3)) = "P10" then 10when trim(substring(promo_name,instr(promo_name, "P11"),3)) = "P11" then 11when trim(substring(promo_name,instr(promo_name, "P12"),3)) = "P12" then 12when trim(substring(promo_name,instr(promo_name, "P13"),3)) = "P13" then 13when trim(substring(promo_name,instr(promo_name, "P14"),3)) = "P14" then 14else 0 end as idfrom hbi_dns_protected.store_zones_stock_v7_1_4where promo_name is not null
Trying to extract ID from String, it works fine from P10 to P14 when I use in a separate column, when I do in the same column it just pick 1 instead of 11, 1 instead of 12 etc...
Am I making an mistake here?sample data
Why not use regexp_extract
to do a RegEx extraction from your string, rather than write code for each case, something like:
%sql
SELECT *,
regexp_extract( promo_name, ' P(\\d+)', 1 ) AS promoNumber
FROM tmp
My results:
NB The regex is case-sensitive. If you have a requirement to capture either lower or upper case Ps, then you could use a character class ie [pP]
instead.
A full explanation of the RegEx pattern used:
- the regex starts off with a space character and a capital P. This will match literally a space and a capital P. If you want to make the match case-insensitive, you can use a character class eg
[pP]
which says match any character (case-sensitive) in the brackets - The next component of the RegEx is
(\\d+)
. This is made up of the RegEx pattern\d
for matching digits, the+
symbol which means, 'match one or more'. The brackets make it into a group, which is group 1. The\d
has an additional slash which is an escape character required for the Spark SQL implementation ofregexp_extract
. - the last argument of
regexp_extract
has the value of 1 which means 'return group 1 from the function'
I use regex101.com for testing and practicing with RegEx expressions.
这篇关于Databricks Spark SQL中的字符串比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!