本文介绍了PIG 中的 REGEX_EXTRACT 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 3 列的 CSV 文件:tweetidtweetUserid.但是在 tweet 列中有逗号分隔值.

I have a CSV file with 3 columns: tweetid , tweet, and Userid. However within the tweet column there are comma separated values.

即1 行数据:

`396124437168537600`,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143

我想单独提取所有 3 个字段,但是 REGEX_EXTRACT 给我一个错误代码:

I want to extract all 3 fields individually, but REGEX_EXTRACT is giving me an error with this code:

a = LOAD tweets USING PigStorage(',') AS (f1,f2,f3);

b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\"(.*)',1);

错误是:

error: Filter's condition must evaluate to boolean.

推荐答案

在shared的用例中,使用PigStrorage(',')读取数据会导致savava143(last field value)丢失

In the use case shared, reading the data using PigStrorage(',') will result in missing savava143 (last field value)

A = LOAD '/Users/muralirao/learning/pig/a.csv' USING PigStorage(',') AS (f1,f2,f3);
DUMP A;

输出:A:观察最后一个字段值丢失.

Output : A : Observe that the last field value is missing.

(396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.")

对于共享用例,要从 CSV 文件中提取字段值为,"的所有值,我们可以使用 CSVExcelStorage 或 CSVLoader.

For the use case shared, to extract all the values from CSV file with field values having ',' we can use either CSVExcelStorage or CSVLoader.

方法一:使用CSVExcelStorage

参考:http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html

输入:a.csv

396124437168537600,"I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.",savava143

猪脚本:

REGISTER piggybank.jar;
A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (f1,f2,f3);
DUMP A;

输出:A

(396124437168537600,I really wish I didn't give up everything I did for you, I'm so mad at my self for even letting it get as far as it did.,savava143)

方法 2:使用 CSVLoader

参考:http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVLoader.html

下面的脚本使用了 CSVLoader(),DUMP A 将产生与之前看到的相同的输出.

Below script makes use of CSVLoader(), DUMP A will result in the same output seen earlier.

A = LOAD 'a.csv' USING org.apache.pig.piggybank.storage.CSVLoader() AS (f1,f2,f3);

这篇关于PIG 中的 REGEX_EXTRACT 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-11 07:44