本文介绍了通过hdfs目录循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的HDFS目录结构如下所示。



/user/hive/warehouse/check.db/abcd/date=2015-02-02/xyz=hijk/000000_0
/ user / hive /仓库/ check.db / abcd / date = 2015-02-02 / xyz = pqrs / 000000_0



我试图循环遍历/ user /配置单元/仓库/ check.db / abcd并派生出2个字段,并使用下面的代码。

  INPUT ='/ user / hive / warehouse / check.db / abcd'

for dir in $(hadoop fs -ls $ INPUT / * / | grep -o -e$ INPUT /.*);做

xyz = $(echo $ dir | cut -d'='-f 3)
date = $(echo $ dir | sed's /.* date = \( * \)\ / xyz。* / \ 1 / g')

完成

这是做这件事的最好方法还是有更好的替代方法来做到这一点?

>您也可以使用Java代码或python脚本,尽管这似乎也足够好。


My HDFS directory structure looks like below.

/user/hive/warehouse/check.db/abcd/date=2015-02-02/xyz=hijk/000000_0/user/hive/warehouse/check.db/abcd/date=2015-02-02/xyz=pqrs/000000_0

I am trying to loop through all the directories under "/user/hive/warehouse/check.db/abcd" and derive 2 fields and am using the below code.

INPUT='/user/hive/warehouse/check.db/abcd'

for dir in $(hadoop fs -ls $INPUT/*/ | grep -o -e "$INPUT/.*") ; do

    xyz=$(echo $dir | cut -d '=' -f 3)
    date=$(echo $dir | sed 's/.*date=\(.*\)\/xyz.*/\1/g')

done

Is it the best way to do this or is there a better alternate way to do this?

解决方案

You could also use Java code or a python script, although this seems to be good enough as well.

这篇关于通过hdfs目录循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 11:41