问题描述
我想知道是否有任何命令/表达式可以仅获取 hadoop 中的文件名.我只需要获取文件名,当我执行 hadoop fs -ls
时,它会打印整个路径.
I would like to know is there any command/expression to get only the file name in hadoop. I need to fetch only the name of file, when I do hadoop fs -ls
it prints the whole path.
我在下面尝试过,但只是想知道是否有更好的方法来做到这一点.
I tried below but just wondering if some better way to do it.
hadoop fs -ls <HDFS_DIR>|cut -d ' ' -f17
推荐答案
看来 hadoop ls 不支持仅输出文件名的任何选项,甚至仅输出最后一列.
It seems hadoop ls does not support any options to output just the filenames, or even just the last column.
如果您想可靠地获取最后一列,您应该首先将空格转换为单个空格,以便您可以对最后一列进行寻址:
If you want get the last column reliably, you should first convert the whitespace to a single space, so that you can then address the last column:
hadoop fs -ls | sed '1d;s/ */ /g' | cut -d -f8
这将为您提供最后一列但包含整个路径的文件.如果你只想要文件名,你可以像@rojomoke 建议的那样使用 basename:
This will get you just the last column but files with the whole path. If you want just filenames, you can use basename as @rojomoke suggests:
hadoop fs -ls | sed '1d;s/ */ /g' | cut -d -f8 | xargs -n 1 basename
我还过滤掉了第一行 Found ?x items
注意:请注意,正如@felix-frank 在评论中指出的那样,上述命令不会正确保留具有多个连续空格的文件名.因此,Felix 提出了一个更正确的解决方案:
Note: beware that, as @felix-frank notes in the comments, that the above command will not correctly preserve file names with multiple consecutive spaces. Hence a more correct solution proposed by Felix:
hadoop fs -ls/tmp |sed 1d |perl -wlne'print +(split " ",$_,8)[7]'
这篇关于如何仅列出 HDFS 中的文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!