问题描述
我想知道是否有任何命令/表达式只能获取hadoop中的文件名.我只需要提取文件名,当我执行hadoop fs -ls
时,它将打印整个路径.
I would like to know is there any command/expression to get only the file name in hadoop. I need to fetch only the name of file, when I do hadoop fs -ls
it prints the whole path.
我在下面尝试过,但是只是想知道是否有更好的方法.
I tried below but just wondering if some better way to do it.
hadoop fs -ls <HDFS_DIR>|cut -d ' ' -f17
推荐答案
似乎 hadoop ls不支持仅输出文件名,甚至仅输出最后一列的任何选项.
It seems hadoop ls does not support any options to output just the filenames, or even just the last column.
如果要可靠地获取最后一列,则应首先将空格转换为单个空格,以便随后可以寻址最后一列:
If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:
hadoop fs -ls | sed '1d;s/ */ /g' | cut -d\ -f8
这将使您仅获得最后一列,但具有完整路径的文件.如果只需要文件名,则可以按@rojomoke的建议使用基本名称:
This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:
hadoop fs -ls | sed '1d;s/ */ /g' | cut -d\ -f8 | xargs -n 1 basename
我还过滤掉了第一行说Found ?x items
I also filtered out the first line that says Found ?x items
注意:请注意,如注释中的@ felix-frank所述,上述命令将不能正确保留多个连续空格的文件名.因此,Felix提出了一个更正确的解决方案:
Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:
hadoop fs -ls /tmp | sed 1d | perl -wlne'print +(split " ",$_,8)[7]'
这篇关于如何仅列出HDFS中的文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!