问题描述
我有一个UDF jar,它通过Pig输入一个String作为输入.此java文件通过运行诸如此命令之类的硬编码"字符串而通过pig fine工作
I have a UDF jar which takes in a String as an input through Pig. This java file works through pig fine as running a 'hard coded' string such as this command
B = foreach f generate URL_UDF.mathUDF('stack.overflow');
会给我我期望的输出
我的问题是我试图从文本文件中获取信息,并将其与UDF一起使用.我加载了一个文件,并希望在该文件中传递已加载到UDF的数据.
My question is I am trying to get information from a text file and use my UDF with it. I load a file and want to pass data within that file which I have loaded to the UDF.
LoadData = load 'data.csv' using PigStorage(',');
f = foreach LoadData generate $0 as col0, $1 as chararray
$ 1是我需要的列,用于研究数据类型( http ://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Data+Types )使用char数组.
$1 is the column I needed and researching data types (http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#Data+Types) a char array is used.
然后我尝试使用以下命令 B = foreach f生成URL_UDF.mathUDF($ 1);
I then tryed using the following command B = foreach f generate URL_UDF.mathUDF($1);
将数据传递到无法说明的jar
to pass the data into the jar which fails stating
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
如果有人对此有任何解决方案,那就太好了.
If anybody has any solution to this that would be great.
我正在运行的Java代码如下
The java code I am running is as follows
package URL_UDF;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.EvalFunc;
import org.apache.pig.PigWarning;
import org.apache.pig.data.Tuple;
import org.apache.commons.logging.Log;
import org.apache.*;
public class mathUDF extends EvalFunc<String> {
public String exec(Tuple arg0) throws IOException {
// TODO Auto-generated method stub
try{
String urlToCheck = (String) arg0.get(0);
return urlToCheck;
}catch (Exception e) {
// Throwing an exception will cause the task to fail.
throw new IOException("Something bad happened!", e);
}
}
}
谢谢
推荐答案
您可以使用LOAD来指定架构,如下所示:
You can specify the schema with LOAD as follows
LoadData = load 'data.csv' using PigStorage(',') AS (col0: chararray, col1:chararray);
,然后将col1
传递到UDF.
and pass col1
to the UDF.
或
B = foreach LoadData generate (chararray)$1 AS col1:chararray;
实际上,这是Pig中的错误( PIG-2315 ),将在0.12中修复.1. foreach中的AS子句无法正常工作.
Actually, this is a bug (PIG-2315) in Pig which will be fixed in 0.12.1. The AS clause in foreach does not work as one would expect.
这篇关于使用Pig通过Java运行字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!