问题描述
我正在编写一个 shell 脚本,以便在生成数据后立即将其放入 hadoop.我可以 ssh 到我的主节点,将文件复制到那里的文件夹,然后将它们放入 hadoop.我正在寻找一个 shell 命令来摆脱将文件复制到主节点上的本地磁盘.为了更好地解释我需要什么,您可以在下面找到我目前所拥有的:
I am writing a shell script to put data into hadoop as soon as they are generated. I can ssh to my master node, copy the files to a folder over there and then put them into hadoop. I am looking for a shell command to get rid of copying the file to the local disk on master node. to better explain what I need, here below you can find what I have so far:
1) 将文件复制到主节点的本地磁盘:
1) copy the file to the master node's local disk:
scp test.txt username@masternode:/folderName/
我已经使用密钥建立了 SSH 连接.因此无需密码即可执行此操作.
I have already setup SSH connection using keys. So no password is needed to do this.
2) 我可以使用 ssh 远程执行 hadoop put 命令:
2) I can use ssh to remotely execute the hadoop put command:
ssh username@masternode "hadoop dfs -put /folderName/test.txt hadoopFolderName/"
我正在寻找的是如何将这两个步骤通过管道/组合成一个步骤并跳过 masterNode 本地磁盘上文件的本地副本.
what I am looking for is how to pipe/combine these two steps into one and skip the local copy of the file on masterNode's local disk.
谢谢
换句话说,我想以一种我可以的方式传递几个命令
In other words, I want to pipe several command in a way that I can
推荐答案
试试这个(未经测试):
Try this (untested):
cat test.txt | ssh username@masternode "hadoop dfs -put - hadoopFoldername/test.txt"
我使用了类似的技巧来复制目录:
I've used similar tricks to copy directories around:
tar cf - . | ssh remote "(cd /destination && tar xvf -)"
这会将 local-tar
的输出发送到 remote-tar
的输入中.
This sends the output of local-tar
into the input of remote-tar
.
这篇关于将远程文件放入 hadoop 而不将其复制到本地磁盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!