问题描述
我已将文件14MB分别以块(5MB)的大小上传到S3,并且还使用spark-md5计算了每个块的哈希值.每个小块的单独哈希(由spark-md5生成)与上传到S3的每个大块的ETag匹配.
I have uploaded a file 14MB to S3 in chunks (5MB) each and also using spark-md5 calculated the hash of each chunk. The individual hash of each chunk (generated by spark-md5) is matching with ETag of each chunk uploaded to S3.
但是通过完全上传到S3生成的ETag哈希值与spark-md5生成的本地计算的哈希值不匹配.以下是本地哈希的步骤:
But the ETag hash generated by doing full upload to S3 is not matching with locally calculated hash generated by spark-md5. Below are the steps for local hash:
- 生成每个块的哈希(由spark-md5生成)
- 加入每个块的哈希
- 转换为十六进制
- 计算出的哈希值
下面是代码,请检查是否有任何错误.方法1:
Below is the code, please check if there is any mistake.Approach 1:
var mergeChunk = self.chunkArray.join('');
console.log("mergeChunk: " + mergeChunk);
var hexString = toHexString(mergeChunk);
console.log("toHexString: " + hexString);
var cspark1 = SparkMD5.hash(hexString);
console.log("SparkMD5 final hash: " + cspark1);
方法2:
var mergeChunk = self.chunkArray.join('');
console.log("mergeChunk: " + mergeChunk);
var cspark2 = SparkMD5.hash(mergeChunk);
console.log("SparkMD5 final hash: " + cspark2);
请提供用于计算ETag的正确逻辑.
Please provide correct logic for calculating ETag.
推荐答案
etags是不透明的;对于分段上传的标签是什么,AWS不做任何保证.
etags are meant to be opaque; AWS don't make any guarantees as to what to the tag of a multipart upload is.
我认为它只是块中的猫(按照最终POST中列出的顺序),但是您不能依靠它.
I think it is just the cat of the blocks (in the order listed in the final POST), but you cannot rely on that.
这篇关于使用Spark MD5在本地计算S3 ETag的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!