本文介绍了Hive:使用正则表达式分割字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一串 :: 分隔符的单词。我如何使用 Hive UDF regexp_extract()来从字符串中提取单词?

解决方案

  regexp_extract('2foa1fa :: 12hjk',' ^(\\w。*)\\:{2}(\\w。*)$',1)as word1 

OUTPUT: 2foa1fa

  regexp_extract('2foa1fa :: 12hjk','^(\\w。*)\\:{2}(\\w。*)$',2)as word2 

输出: 12hjk




  • ^锚定到字符串的开始处
  • \\w寻找单词字符,而。*表示任意数字时间

  • \\:{2}在连续(这是您的::分隔符)中查找两个:
  • $ anchors字符串到字符串结尾
  • 指定regexp_extract中的第三个参数提取索引的(模式)


    现在是绝对的你可以使用一个分割函数来创建一个数组,然后用这个数组来查询阵列位置以及。看起来像这样:

    $ p $
    中选择my_array [2](select split('2foa1fa :: 12hjk ','\\ ::')as my_array from my_table)b;

    OUTPUT: 12hjk


    I have a string of words that are :: delimited. How can I use the Hive UDF regexp_extract() to extract words from the string?

    解决方案
    regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',1) as word1
    

    OUTPUT: 2foa1fa

    regexp_extract('2foa1fa::12hjk','^(\\w.*)\\:{2}(\\w.*)$',2) as word2
    

    OUTPUT: 12hjk

    • ^ anchors to the beginning of the string
    • The \\w looks for a word character and .* means any number of times
    • The \\:{2} looks for two : in a row (this is your :: delimiter)
    • $ anchors the string to the end of the string
    • specifying the third parameter in regexp_extract extracts the indexed (pattern)

    Now just stick your column name in the place of the string literal and you should be good to go.

    You can also use a split function creating an array and then query by the array location as well. Which will look something like this:

    select my_array[2] from
        (select split('2foa1fa::12hjk','\\::') as my_array from my_table) b;
    

    OUTPUT: 12hjk

    这篇关于Hive:使用正则表达式分割字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 01:07