本文介绍了PIG中的REGEX_EXTRACT错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含3列的CSV文件:tweetidtweetUserid.但是,在tweet列中有逗号分隔的值.

I have a CSV file with 3 columns: tweetid , tweet, and Userid. However within the tweet column there are comma separated values.

即1行数据:

`396124437168537600`,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143

我想分别提取所有3个字段,但是REGEX_EXTRACT使用此代码给我一个错误:

I want to extract all 3 fields individually, but REGEX_EXTRACT is giving me an error with this code:

a = LOAD tweets USING PigStorage(',') AS (f1,f2,f3);

b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\"(.*)',1);

错误是:

error: Filter's condition must evaluate to boolean.

推荐答案

在共享的用例中,使用PigStrorage(',')读取数据将导致缺少savava143(最后一个字段值)

In the use case shared, reading the data using PigStrorage(',') will result in missing savava143 (last field value)

A = LOAD '/Users/muralirao/learning/pig/a.csv' USING PigStorage(',') AS (f1,f2,f3);
DUMP A;

输出:A:请注意缺少最后一个字段值.

Output : A : Observe that the last field value is missing.

(396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.")

对于共享的用例,要从CSV文件中提取所有值为''的值,我们可以使用CSVExcelStorage或CSVLoader.

For the use case shared, to extract all the values from CSV file with field values having ',' we can use either CSVExcelStorage or CSVLoader.

方法1:使用CSVExcelStorage

Ref: http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html

输入:a.csv

396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143

猪脚本:

REGISTER piggybank.jar;
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (f1,f2,f3);
DUMP A;

输出:A

(396124437168537600,I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.,savava143)

方法2:使用CSVLoader

Ref: http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVLoader.html

下面的脚本使用CSVLoader(),DUMP A将产生与前面相同的输出.

Below script makes use of CSVLoader(), DUMP A will result in the same output seen earlier.

A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (f1,f2,f3);

这篇关于PIG中的REGEX_EXTRACT错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 07:46