本文介绍了为什么用 Ansible 复制目录这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Ansible 将目录(900 个文件,136MB)从一台主机复制到另一台:

I'm using Ansible to copy a directory (900 files, 136MBytes) from one host to another:

---
- name: copy a directory
  copy: src={{some_directory}} dest={{remote_directory}}

这个操作花费了难以置信的 17 分钟,而一个简单的 scp -r <src> 仅需 7 秒.

This operation takes an incredible 17 minutes, while a simple scp -r <src> <dest> takes a mere 7 seconds.

我尝试了加速模式,根据 ansible 文档可以比任何地方快 2-6 倍启用 ControlPersist 的 SSH,比 paramiko 快 10 倍.",但无济于事.

I have tried the Accelerated mode, which according to the ansible docs "can be anywhere from 2-6x faster than SSH with ControlPersist enabled, and 10x faster than paramiko.", but to no avail.

推荐答案

TLDR:使用 synchronize 而不是 copy.

TLDR: use synchronize instead of copy.

这是我正在使用的 copy 命令:

Here's the copy command I'm using:

- copy: src=testdata dest=/tmp/testdata/

猜测,我认为同步操作很慢.文件模块文档也暗示了这一点:

As a guess, I assume the sync operations are slow. The files module documentation implies this too:

复制"模块递归复制工具不会扩展到大量(>数百个)文件.作为替代方案,请参阅同步模块,它是 rsync 的包装器.

挖掘源代码显示每个文件都是用 SHA1 处理.这是使用hashlib.sha1实现的本地测试意味着 900 个文件只需要 10 秒(恰好占用 400mb 的空间).

Digging into the source shows each file is processed with SHA1. That's implemented using hashlib.sha1. A local test implies that only takes 10 seconds for 900 files (that happen to take 400mb of space).

所以,下一条大道.该副本由 module_utils/atom 处理方法.我不确定加速模式是否有帮助(这是一个 大部分已弃用功能),但我尝试了pipelining,将其放入本地ansible.cfg:

So, the next avenue. The copy is handled with module_utils/basic.py's atomic_move method. I'm not sure if accelerated mode helps (it's a mostly-deprecated feature), but I tried pipelining, putting this in a local ansible.cfg:

[ssh_connection]
pipelining=True

似乎没有帮助;我的示例运行了 24 分钟.显然有一个循环检查文件,上传它,修复权限,然后开始下一个文件.这是很多命令,即使 ssh 连接保持打开状态.在两行之间阅读它有点意义 - 我认为文件传输"不能在流水线中完成.

It didn't appear to help; my sample took 24 minutes to run . There's obviously a loop that checks a file, uploads it, fixes permissions, then starts on the next file. That's a lot of commands, even if the ssh connection is left open. Reading between the lines it makes a little bit of sense- the "file transfer" can't be done in pipelining, I think.

所以,按照提示使用 synchronize 命令:

So, following the hint to use the synchronize command:

- synchronize: src=testdata dest=/tmp/testdata/

即使使用 pipeline=False,这也需要 18 秒.显然,在这种情况下,synchronize 命令是可行的方法.

That took 18 seconds, even with pipeline=False. Clearly, the synchronize command is the way to go in this case.

记住 synchronize 使用 rsync,它默认为 mod-time 和文件大小.如果您想要或需要校验和,请将 checksum=True 添加到命令中.即使启用了校验和,时间也没有真正改变 - 仍然是 15-18 秒.我通过使用 -vvvv 运行 ansible-playbook 来验证校验和选项是否打开,可以在这里看到:

Keep in mind synchronize uses rsync, which defaults to mod-time and file size. If you want or need checksumming, add checksum=True to the command. Even with checksumming enabled the time didn't really change- still 15-18 seconds. I verified the checksum option was on by running ansible-playbook with -vvvv, that can be seen here:

ok: [testhost] => {"changed": false, "cmd": "rsync --delay-updates -FF --compress --checksum --archive --rsh 'ssh  -o StrictHostKeyChecking=no' --out-format='<<CHANGED>>%i %n%L' \"testdata\" \"user@testhost:/tmp/testdata/\"", "msg": "", "rc": 0, "stdout_lines": []}

这篇关于为什么用 Ansible 复制目录这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 18:59