本文介绍了Hive:使用正则表达式分割字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一串 ::
分隔符的单词。我如何使用 Hive
UDF
regexp_extract()
来从字符串中提取单词?
解决方案
regexp_extract('2foa1fa :: 12hjk',' ^(\\w。*)\\:{2}(\\w。*)$',1)as word1
OUTPUT: 2foa1fa
regexp_extract('2foa1fa :: 12hjk','^(\\w。*)\\:{2}(\\w。*)$',2)as word2
输出: 12hjk
- ^锚定到字符串的开始处
- \\w寻找单词字符,而。*表示任意数字时间
- \\:{2}在连续(这是您的::分隔符)中查找两个:
- $ anchors字符串到字符串结尾
- 指定regexp_extract中的第三个参数提取索引的(模式)
现在是绝对的你可以使用一个分割函数来创建一个数组,然后用这个数组来查询阵列位置以及。看起来像这样:
$ p $从
中选择my_array [2](select split('2foa1fa :: 12hjk ','\\ ::')as my_array from my_table)b;
OUTPUT:
12hjk
I have a string of words that are
::
delimited. How can I use theHive
UDF
regexp_extract()
to extract words from the string?解决方案regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',1) as word1
OUTPUT:
2foa1fa
regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',2) as word2
OUTPUT:
12hjk
- ^ anchors to the beginning of the string
- The \\w looks for a word character and .* means any number of times
- The \\:{2} looks for two : in a row (this is your :: delimiter)
- $ anchors the string to the end of the string
- specifying the third parameter in regexp_extract extracts the indexed (pattern)
Now just stick your column name in the place of the string literal and you should be good to go.
You can also use a split function creating an array and then query by the array location as well. Which will look something like this:
select my_array[2] from (select split('2foa1fa::12hjk','\\::') as my_array from my_table) b;
OUTPUT:
12hjk
这篇关于Hive:使用正则表达式分割字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
08-15 01:07