Hadoop3集群搭建之——hive添加自定义函数UDTF

上篇：

Hadoop3集群搭建之——虚拟机安装

Hadoop3集群搭建之——安装hadoop，配置环境

Hadoop3集群搭建之——配置ntp服务

Hadoop3集群搭建之——hive安装

Hadoop3集群搭建之——hbase安装及简单操作

Hadoop3集群搭建之——hive添加自定义函数UDF

其他配置请参照上篇：Hadoop3集群搭建之——hive添加自定义函数UDF

简述下需求：

　　系统userid格式如下：

　　前三位代表国家

　　接下来三位代表省

　　再接下来三位代表市

　　剩下的所以代表商店

（瞎掰的需求，大意就是要切割字符串）

直接上代码：

/**

 * Created by venn on 5/20/2018.

 * SplitString : split string

 * first 3 string : country

 * next 3 string : province

 * next 3 string : city

 * next all : story

 */

public class SplitString extends GenericUDTF {

    /**

     * add the column name，添加列名，类型。使用的hive-exec 1.2.1,想用2.3.3的，但是不会初始化列名部分

     * @param args

     * @return

     * @throws UDFArgumentException

     */

    @Override

    public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {

        if (args.length != ) {

            throw new UDFArgumentLengthException("ExplodeMap takes only one argument");

        }

        if (args[].getCategory() != ObjectInspector.Category.PRIMITIVE) {

            throw new UDFArgumentException("ExplodeMap takes string as a parameter");

        }

        ArrayList<String> fieldNames = new ArrayList<String>();

        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

        fieldNames.add("userid"); // 第一列将输入字符串原样输出，方便查看

        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        fieldNames.add("country");  // 第二列为国家

        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        fieldNames.add("province"); //第三列为省

        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        fieldNames.add("city");  // 第四列为市

        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        fieldNames.add("story");  // 第五列商店

        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

　　　　 // 返回

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);

    }

    /**

     * process the column

     * @param objects

     * @throws HiveException

     */

    public void process(Object[] objects) throws HiveException {

        String[] result = new String[];

        try {

            /*System.out.println(objects[0].toString());

            System.out.println(objects[0] != null);

            System.out.println(StringUtils.isEmpty(objects[0].toString()));

            System.out.println(objects[0].toString().length() < 10);*/
　　　　　　　　// 如果数据不满足要求，返回 0 0 0 0 0

            if (objects[] == null || StringUtils.isEmpty(objects[].toString()) || objects[].toString().length() < ) {

                result[] = "";

                result[] = "";

                result[] = "";

                result[] = "";

                result[] = "";

            } else {

                result[] = objects[].toString();

                result[] = objects[].toString().substring(, );

                result[] = objects[].toString().substring(, );

                result[] = objects[].toString().substring(, );

                result[] = objects[].toString().substring();

            }

            // 将数据返回

            forward(result);

        } catch (Exception e) {

        }

    }

    public void close() throws HiveException {

    }

}

hive UDTF函数编有三个部分：　　

initialize ： 初始化列名

process ： 处理字符串部分

forward ： 返回结果

使用方式请见上篇：Hadoop3集群搭建之——hive添加自定义函数UDF打包、上传服务器，修改 $HIVE_HOME/bin/.hiverc
添加如下内容： jar包可以添加多个

[hadoop@venn05 bin]$ more .hiverc

add jar /opt/hadoop/idp_hd/viewstat/lib/hivefunction-1.0-SNAPSHOT.jar;

create temporary function split_area as 'com.venn.udtf.SplitString';

使用结果如下：

hive> select split_area(userid) from sqoop_test limit ;

OK