本文介绍了复制1TB稀疏文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了一个1TB的稀疏文件,在Linux上实际上存储了32MB数据.

I got a sparse file of 1TB which stores actually 32MB data on Linux.

是否可以有效地"制作一个软件包来存储稀疏文件?该软件包应解压缩为另一台计算机上的1TB稀疏文件.理想情况下,软件包"应为32MB左右.

Is it possible to "efficiently" make a package to store the sparse file? The package should be unpacked to be a 1TB sparse file on another computer. Ideally, the "package" should be around 32MB.

注意:可能的解决方案是使用'tar': https ://wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27

Note: On possible solution is to use 'tar': https://wiki.archlinux.org/index.php/Sparse_file#Archiving_with_.60tar.27

但是,对于1TB的稀疏文件,尽管tar球可能很小,但是将稀疏文件存档会花费很长时间.

However, for a 1TB sparse file, although the tar ball may be small, archiving the sparse file will take too long a time.

编辑1

我测试了tar和gzip,结果如下(请注意,该稀疏文件包含0字节的数据).

I tested the tar and gzip and the results are as follows (Note that this sparse file contains data of 0 byte).

$ du -hs sparse-1
0   sparse-1

$ ls -lha sparse-1
-rw-rw-r-- 1 user1 user1 1.0T 2012-11-03 11:17 sparse-1

$ time tar cSf sparse-1.tar sparse-1

real    96m19.847s
user    22m3.314s
sys     52m32.272s

$ time gzip sparse-1

real    200m18.714s
user    164m33.835s
sys     10m39.971s

$ ls -lha sparse-1*
-rw-rw-r-- 1 user1 user1 1018M 2012-11-03 11:17 sparse-1.gz
-rw-rw-r-- 1 user1 user1   10K 2012-11-06 23:13 sparse-1.tar

包含0字节数据的1TB文件sparse-1可以由'tar'存档到10KB的tar球中,也可以由gzip压缩为〜1GB文件. gzip的时间大约是tar使用时间的2倍.

The 1TB file sparse-1 which contains 0 byte data can be archived by 'tar' to a 10KB tar ball or compressed by gzip to a ~1GB file. gzip takes around 2 times of the time than the time tar uses.

通过比较,'tar'似乎比gzip更好.

From the comparison, 'tar' seems better than gzip.

但是,对于包含0字节数据的稀疏文件来说,96分钟太长了.

However, 96 minutes are too long for a sparse file that contains data of 0 byte.

编辑2

rsync似乎完成复制文件的时间比tar多但少于gzip:

rsync seems finish copying the file in more time than tar but less than gzip:

$ time rsync --sparse sparse-1 sparse-1-copy

real    124m46.321s
user    107m15.084s
sys     83m8.323s

$ du -hs sparse-1-copy 
4.0K    sparse-1-copy

因此,对于这个极为稀疏的文件,tar + cpscp应该比直接rsync快.

Hence, tar + cp or scp should be faster than directly rsync for this extremely sparse file.

编辑3

感谢@mvp指出了较新内核中的SEEK_HOLE功能. (我以前在2.6.32 Linux内核上工作).

Thanks to @mvp for pointing out the SEEK_HOLE functionality in newer kernel. (I previously work on a 2.6.32 Linux kernel).

注意:需要bsdtar版本> = 3.0.4(在此处检查: http://ask.fclose.com/4/how-to-effectively-archive-a-very-large-sparse-file?show=299# c299 ).

Note: bsdtar version >=3.0.4 is required (check here: http://ask.fclose.com/4/how-to-efficiently-archive-a-very-large-sparse-file?show=299#c299 ).

在更新的内核和Fedora版本(17)上,tarcp有效地处理稀疏文件非常.

On a newer kernel and Fedora release (17), tar and cp handles the sparse file very efficiently.

[zma@office tmp]$ ls -lh pmem-1 

-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:14 pmem-1
[zma@office tmp]$ time tar cSf pmem-1.tar pmem-1

real    0m0.003s
user    0m0.003s
sys 0m0.000s
[zma@office tmp]$ time cp pmem-1 pmem-1-copy

real    0m0.020s
user    0m0.000s
sys 0m0.003s
[zma@office tmp]$ ls -lh pmem*
-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:14 pmem-1
-rw-rw-r-- 1 zma zma 1.0T Nov  7 20:15 pmem-1-copy
-rw-rw-r-- 1 zma zma  10K Nov  7 20:15 pmem-1.tar
[zma@office tmp]$ mkdir t
[zma@office tmp]$ cd t
[zma@office t]$ time tar xSf ../pmem-1.tar 

real    0m0.003s
user    0m0.000s
sys 0m0.002s
[zma@office t]$ ls -lha
total 8.0K
drwxrwxr-x   2 zma  zma  4.0K Nov  7 20:16 .
drwxrwxrwt. 35 root root 4.0K Nov  7 20:16 ..
-rw-rw-r--   1 zma  zma  1.0T Nov  7 20:14 pmem-1

我正在使用3.6.5内核:

I am using a 3.6.5 kernel:

[zma@office t]$ uname -a
Linux office.zhiqiangma.com 3.6.5-1.fc17.x86_64 #1 SMP Wed Oct 31 19:37:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

推荐答案

简短答案:使用bsdtar或GNU tar(1.29版或更高版本)来创建档案,并使用GNU tar(1.26版或更高版本)来将它们提取到另一个盒子上.

Short answer:Use bsdtar or GNU tar (version 1.29 or later) to create archives, and GNU tar (version 1.26 or later) to extract them on another box.

长答案:这项工作需要一些条件.

Long answer:There are some requirements for this to work.

首先,Linux必须至少为内核3.1(Ubuntu 12.04或更高版本可以使用),因此它支持SEEK_HOLE功能.

First, Linux must be at least kernel 3.1 (Ubuntu 12.04 or later would do), so it supports SEEK_HOLE functionality.

然后,您需要可以支持此syscall的tar实用程序. GNU tar自1.29版开始支持(于2016/05/16发布,默认情况下自Ubuntu 18.04起应存在),或bsdtar自版本3.0.4起(自Ubuntu 12.04起可用)-使用.

Then, you need tar utility that can support this syscall. GNU tar supports it since version 1.29 (released on 2016/05/16, it should be present by default since Ubuntu 18.04), or bsdtar since version 3.0.4 (available since Ubuntu 12.04) - install it using sudo apt-get install bsdtar.

虽然bsdtar(使用libarchive)很棒,但是很遗憾,它在取消限制方面不是很聪明-它愚蠢地要求目标驱动器上的可用空间至少等于未限制文件大小,而没有关于孔. GNU tar将有效地解压缩这种稀疏的归档文件,并且不会检查这种情况.

While bsdtar (which uses libarchive) is awesome, unfortunately, it is not very smart when it comes to untarring - it stupidly requires to have at least as much free space on target drive as untarred file size, without regard to holes. GNU tar will untar such sparse archives efficiently and will not check this condition.

这是来自Ubuntu 12.10(Linux内核3.5)的日志:

This is log from Ubuntu 12.10 (Linux kernel 3.5):

$ dd if=/dev/zero of=1tb seek=1T bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.000143113 s, 7.0 kB/s

$ time bsdtar cvfz sparse.tar.gz 1tb 
a 1tb

real    0m0.362s
user    0m0.336s
sys 0m0.020s

# Or, use gnu tar if version is later than 1.29:
$ time tar cSvfz sparse-gnutar.tar.gz 1tb
1tb

real    0m0.005s
user    0m0.006s
sys 0m0.000s

$ ls -l
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov  7 01:43 1tb
-rw-rw-r-- 1 autouser autouser           257 Nov  7 01:43 sparse.tar.gz
-rw-rw-r-- 1 autouser autouser           134 Nov  7 01:43 sparse-gnutar.tar.gz
$

像我在上面说过的那样,

不幸的是,除非拥有1TB的可用空间,否则用bsdtar取消标记将不起作用.但是,任何版本的GNU tar都能很好地解压缩sparse.tar:

Like I said above, unfortunately, untarring with bsdtar will not work unless you have 1TB free space. However, any version of GNU tar works just fine to untar such sparse.tar:

$ rm 1tb 
$ time tar -xvSf sparse.tar.gz 
1tb

real    0m0.031s
user    0m0.016s
sys 0m0.016s
$ ls -l
total 8
-rw-rw-r-- 1 autouser autouser 1099511627777 Nov  7 01:43 1tb
-rw-rw-r-- 1 autouser autouser           257 Nov  7 01:43 sparse.tar.gz

这篇关于复制1TB稀疏文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 09:35