问题描述
我有一个自定义的UDF,我想接受多列:
package pigfuncs;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pig.EvalFunc;
import org.apache.pig.FuncSpec;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class DataToXML扩展EvalFunc< String> {
public DataToXML(){
}
@Override
public List< FuncSpec> getArgToFuncMapping()
抛出FrontendException {
List< FuncSpec> funcList = new ArrayList< FuncSpec>();
funcList.add(new FuncSpec(this.getClass()。getName(),
new Schema(new Schema.FieldSchema(null,DataType.CHARARRAY))));
返回funcList;
@Override
public String exec(Tuple t)throws IOException {
if(t == null || t .size()== 0)
return;
StringBuilder result = new StringBuilder();
result.append(< Num>);
result.append((String)t.get(0));
result.append(< / Num>);
result.append(< Tags>);
result.append((String)t.get(1));
result.append(< / Tags);
return result.toString();
}
}
我想传递2列;数量和数据。我希望输出为XYZabc
我无法弄清楚如何让pig脚本调用它,每个组合都会导致不同的错误!
我的脚本摘录:
- 应用某种UDF返回确切的行没有停止词
nostop = FOREACH清理生成lotnum,pigfuncs.StopWords(描述)作为数据;
- 放入xml
out = FOREACH nostop GENERATE pigfuncs.DataToXML(lotnum,data);
这个错误是:
无法推断rapp.pigfuncs.DataToXML的匹配函数为多个或不匹配。请使用明确的演员表。
希望这对猪大师来说很简单:)
Duncan
您的 getArgToFuncMapping()
实现表示您只希望有一个参数。 (您只添加了一个字段到 funcList
。)如果您不打算为这个UDF提供多个实现,具体取决于参数的类型,那么并不需要真正的需要实现 getArgToFuncMapping()
。只需跳过它,这个错误就会消失。 Quick q on Pig UDFs.
I have a custom UDF that I want to accept multiple columns:
package pigfuncs;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pig.EvalFunc;
import org.apache.pig.FuncSpec;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class DataToXML extends EvalFunc<String> {
public DataToXML() {
}
@Override
public List<FuncSpec> getArgToFuncMapping()
throws FrontendException {
List<FuncSpec> funcList = new ArrayList<FuncSpec>();
funcList.add(new FuncSpec(this.getClass().getName(),
new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));
return funcList;
}
@Override
public String exec(Tuple t) throws IOException {
if (t == null || t.size() == 0)
return "";
StringBuilder result = new StringBuilder();
result.append("<Num>");
result.append((String) t.get(0));
result.append("</Num>");
result.append("<Tags>");
result.append((String) t.get(1));
result.append("</Tags");
return result.toString();
}
}
I want to pass 2 columns; Number and Data. I want the output to be XYZabc
I can't work out how to get the pig script to call this, every combination results in a different error!
An excerpt from my script:
-- apply some sort of UDF that returns the exact line without the stop words
nostop = FOREACH cleansed GENERATE lotnum,pigfuncs.StopWords(description) as data;
-- put into xml
out = FOREACH nostop GENERATE pigfuncs.DataToXML(lotnum, data);
The error from this is:
Could not infer the matching function for rapp.pigfuncs.DataToXML as multiple or none of them fit. Please use an explicit cast.
Hope this is an easy one for the Pig gurus :)
Duncan
Your getArgToFuncMapping()
implementation indicates you are only expecting one argument. (You have only added one field to funcList
.) If you're not going to be providing multiple implementations for this UDF depending on the types of the arguments, there's no real need to implement getArgToFuncMapping()
. Just skip it and this error will go away.
这篇关于接受多个输入的猪UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!