问题描述
我的HDFS目录结构如下所示。
/user/hive/warehouse/check.db/abcd/date=2015-02-02/xyz=hijk/000000_0
/ user / hive /仓库/ check.db / abcd / date = 2015-02-02 / xyz = pqrs / 000000_0
我试图循环遍历/ user /配置单元/仓库/ check.db / abcd并派生出2个字段,并使用下面的代码。
INPUT ='/ user / hive / warehouse / check.db / abcd'
for dir in $(hadoop fs -ls $ INPUT / * / | grep -o -e$ INPUT /.*);做
xyz = $(echo $ dir | cut -d'='-f 3)
date = $(echo $ dir | sed's /.* date = \( * \)\ / xyz。* / \ 1 / g')
完成
这是做这件事的最好方法还是有更好的替代方法来做到这一点?
>您也可以使用Java代码或python脚本,尽管这似乎也足够好。
My HDFS directory structure looks like below.
/user/hive/warehouse/check.db/abcd/date=2015-02-02/xyz=hijk/000000_0/user/hive/warehouse/check.db/abcd/date=2015-02-02/xyz=pqrs/000000_0
I am trying to loop through all the directories under "/user/hive/warehouse/check.db/abcd" and derive 2 fields and am using the below code.
INPUT='/user/hive/warehouse/check.db/abcd'
for dir in $(hadoop fs -ls $INPUT/*/ | grep -o -e "$INPUT/.*") ; do
xyz=$(echo $dir | cut -d '=' -f 3)
date=$(echo $dir | sed 's/.*date=\(.*\)\/xyz.*/\1/g')
done
Is it the best way to do this or is there a better alternate way to do this?
You could also use Java code or a python script, although this seems to be good enough as well.
这篇关于通过hdfs目录循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!