本文介绍了使用 spark md5 在本地计算 S3 ETag的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经将一个 14MB 的文件以块 (5MB) 的形式上传到 S3,并且还使用 spark-md5 计算了每个块的哈希值.每个块的单独哈希(由 spark-md5 生成)与上传到 S3 的每个块的 ETag 匹配.

I have uploaded a file 14MB to S3 in chunks (5MB) each and also using spark-md5 calculated the hash of each chunk. The individual hash of each chunk (generated by spark-md5) is matching with ETag of each chunk uploaded to S3.

但是通过完全上传到 S3 生成的 ETag 哈希与由 spark-md5 生成的本地计算的哈希不匹配.以下是本地哈希的步骤:

But the ETag hash generated by doing full upload to S3 is not matching with locally calculated hash generated by spark-md5. Below are the steps for local hash:

  1. 生成每个块的哈希(由 spark-md5 生成)
  2. 加入每个chunk的hash
  3. 转换为十六进制
  4. 计算哈希

下面是代码,请检查是否有任何错误.方法一:

Below is the code, please check if there is any mistake.Approach 1:

        var mergeChunk = self.chunkArray.join('');
        console.log("mergeChunk: " + mergeChunk);

        var hexString = toHexString(mergeChunk);
        console.log("toHexString: " + hexString);

        var cspark1 = SparkMD5.hash(hexString);
        console.log("SparkMD5 final hash: " + cspark1);

方法 2:

       var mergeChunk = self.chunkArray.join('');
       console.log("mergeChunk: " + mergeChunk);
       var cspark2 = SparkMD5.hash(mergeChunk);
       console.log("SparkMD5 final hash: " + cspark2);

请提供计算 ETag 的正确逻辑.

Please provide correct logic for calculating ETag.

推荐答案

etags 是不透明的;AWS 不保证分段上传的标签是什么.

etags are meant to be opaque; AWS don't make any guarantees as to what to the tag of a multipart upload is.

我认为这只是块的猫(按照最终 POST 中列出的顺序),但您不能依赖它.

I think it is just the cat of the blocks (in the order listed in the final POST), but you cannot rely on that.

这篇关于使用 spark md5 在本地计算 S3 ETag的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-30 14:23