问题描述
我有一种情况,我总共有5GB +数据,在一个文件夹(包含子文件夹和许多文件,总共5GB +数据)需要以编程方式传输到\\WindowsMachine \ c $ \温度。
I have a situation that I have total of 5GB+ data that in a folder (having subfolder and many files, total of 5GB+ data) needs to be programmatically transfer to \\WindowsMachine\c$\Temp.
由于文件夹/子文件夹中有多个文件总共5GB +,我在想是否应该首先压缩为一个.zip文件,将zip文件切成小块在转移之前。
As there are many files in a folder/subfolder of total 5GB+, I was thinking if I should zip as one .zip file first, chop the zip file into smaller piece before transfer.
这是一个好方法吗?如果是的话,我想知道如何将zip文件切成小块,以及如何将它们组合为接收方的一个zip文件?
Is this the good approach? If yes, I would like to know how to chop the zip file into smaller piece and how to combine them as one zip file at the receiving side?
如果zip-chop-then -transfer不是一个好主意,管理将许多文件和大尺寸转移到另一台机器的任何建议?
If zip-chop-then-transfer is not a good idea, any suggestion to manage transfer many files and big size to another machine?
非常感谢您的帮助。
Your help is greatly appreciated.
谢谢。
推荐答案
说实话,这实际上取决于。在大多数情况下,将大量文件压缩到单个zip中将更有效地通过网络传输。然而,有一个转折点,其中压缩的开销高于单独传输文件
。不幸的是,没有一种编程算法可以用来解决这个问题。它完全取决于压缩算法和您正在压缩的文件。对合理样本集进行一些测试可能会给你一个合理的
规则(例如100多个文件,我们将压缩)。但压缩本身需要你有空间。你说你有5GB的文件。要压缩,你需要(最多)另外5 GB的空间。如果你在一个可能太多的小驱动器上运行。
To be honest it really depends. In most cases zipping up a large set of files into a single zip would be more efficient to transfer across the network. However there is a tipping point at which the overhead of zipping is higher than transferring the files individually. Unfortunately there isn't a programmatic algorithm you could use to figure this out. It is completely dependent upon the compression algorithm and the files you're compressing. Some testing for a reasonable sample set may give you a reasonable rule (e.g. over 100 files, we'll compress). But compression itself requires that you have the space. You said you had 5GB of files. To compress that you'll need (at most) another 5 GB of space. If you're running on a small drive that may be too much.
让我们假设你已经确定了可以容忍压缩的边界并且你有足够的空间。此时将zip文件分成多块会损害性能,而不是改善它。给定从机器A到机器
B的连接,您只有一个网卡(最有可能),因此数据将被序列化。将大文件分解为较小的文件不会提高性能,它会减少它。这是非常常见的网络聊天问题。制作更少,更大的网络呼叫,然后是许多小网络呼叫,通常是
。
Let's assume that you have identified the boundary at which zipping is tolerable and that you have plenty of space. At this point breaking the zip file up into chunks is going to hurt performance, not improve it. Given a connection from machine A to machine B you only have a single network card (most likely) so the data is going to get serialized. Breaking up a large file into smaller files isn't going to increase performance, it'll decrease it. This is the very common network chattiness issue. It is generally better to make fewer, larger network calls then many small ones.
分块文件的一个好处是可重试性。如果您通过网络传输5GB文件并且最后1字节写入失败,则文件无效,您必须重新开始。使用较小的块可以让您确定哪些块已经发送了
,因此您只发送剩余的块。这需要双方都有合理数量的额外代码,老实说,除非你在一个非常不稳定的网络上,否则可能不值得你实施。但是如果你需要这种支持
那么我会建议你选择现有的支持。 (您可以从.NET调用)直接支持这种东西。如果我决定我需要这种功能,那么我就会使用它。如果这不是一个选项,那么P / Invoke到
也可以处理它,但你必须编写更多代码来支持它。但是,再次,如果遇到问题,只会担心分块。在大多数情况下,您不应该遇到任何这些问题。
The one benefit of chunking a file is retryability. If you transfer a 5GB file across the network and the last 1 byte write fails the file is invalid and you'd have to start over. Using smaller chunks would allow you to determine which chunks have already been sent and so you only send what is left. This requires a reasonable amount of extra code on both sides and, honestly, probably isn't worth your time implementing unless you are on a network that is really unstable. But if you need this kind of support then I would recommend you go with what is already available. Robocopy (which you could call from .NET) supports this kind of stuff directly. If I decided that I needed this kind of functionality then I'd just use it. If that isn't an option then P/Invoke toCopyEx which can also handle it but you'll have to write more code to support it. But, again, only worry about chunking if you run into issues. In most cases you shouldn't have any of these problems.
这篇关于将总共5GB数据传输到\\WindowsMachine \ C $ \ Temp的有效方法是什么?以及如何将zip文件分块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!