问题描述
我看到这个技术在很多地方都被推荐(包括堆栈),我无法摆脱我的头脑,这会减少熵!毕竟,你再次哈希了一些已经被哈希并碰撞的机会。碰撞机会不会碰撞机会导致更多的碰撞机会?经过研究,看起来我错了,但是为什么? 解决方案
我看到这个技术在很多地方都被推荐(包括堆栈),我无法摆脱我的头脑,这会减少熵!毕竟,你再次哈希了一些已经被哈希并碰撞的机会。碰撞机会不会碰撞机会导致更多的碰撞机会?经过研究,看起来我错了,但是为什么? 解决方案
既然你标记了md5,我会用它作为例。从:
然后他们给出的例子明文长度为256字节。由于碰撞攻击依赖于一个128字节的数据块,并且哈希摘要仅为128位,因此碰撞攻击的成功范围并不会增加第一次迭代 - 也就是说,你不能真正影响超出第一个散列的碰撞的可能性。
还要考虑散列的熵是前述的128位。即使考虑到总碰撞机会只有2 ^ 20.96(同样来自),它会花费大量迭代导致两个输入发生冲突。我认为自己成为受害者的第一眼推理是:任何两个任意输入都有碰撞概率x%。b
$ b
这很容易被反例所证实。再考虑MD5:
MD5任意两个连续输入128次,你会发现这不是真的。你可能不会在它们之间找到一个重复的散列 - 毕竟,你只创建了256个可能的2 ^ 128散列值,留下了2 ^ 120个可能性。每轮回合之间发生碰撞的概率是所有其他回合中的。
有两种方法可以理解为什么会这样。首先是每次迭代本质上都试图击中一个移动目标。我认为你可以根据生日悖论构建一个证明,其中有一个令人惊讶的低数量的哈希迭代,你可能会看到一个输入的哈希摘要匹配不同输入的哈希摘要。但是他们几乎肯定会发生在迭代的不同的步骤中。一旦发生这种情况,它们在同一迭代中永远不会有相同的输出,因为散列算法本身是确定性的。另一种方法是认识到散列函数实际上运行时增加熵。考虑一个空字符串与其他输入一样有一个128位摘要;这在算法步骤中没有添加熵的情况下不会发生。这实际上是密码散列函数的必要组成部分:必须销毁数据或者可以从摘要中恢复输入。对于比摘要更长的输入,是的,熵总体上失去了;它必须是为了适应摘要长度。但是也添加了一些熵。
我没有其他散列算法的确切数字,但我认为所有我已经概括的其他散列函数和单向/映射功能。
I see this technique recommended in many places (including stack), and i can't get out of my head that this would reduce entropy! After all, you are hashing something again, that has already been hashed and has a collision chance. Wouldn't collision chance over collision chance results in more collision chances? After researching, it seems I'm wrong, but why?
Since you tagged md5, I'll use that as an example. From wikipedia:
And then the example plaintexts they give are 256 bytes long. Since the collision attack relies on a 128 byte block of data, and the hash digest is only 128 bits, there really isn't an increased risk of a collision attack succeeding beyond the first iteration - that is to say that you can't really influence the likelihood of a collision beyond the first hash.
Also consider that the entropy of the hash is the aforementioned 128 bits. Even considering that the total collision chance is only 2^20.96 (again from wikipedia), it would take a great number of iterations to cause two inputs to collide. The first-glance reasoning that I think you're falling victim to is:
This can be disproven by counterexample fairly easily. Consider again MD5:
MD5 any two inputs 128 times in a row and you will see that this is not true. You probably won't find a single repeated hash between them - after all, you've only created 256 out of a possible 2^128 hash values, leaving 2^120 possibilities. The probabilities of collisions between each round is independent of all other rounds.
There are two approaches to understand why this is so. The first is that each iteration is essentially trying to hit a moving target. I think you could construct a proof based on the birthday paradox that there is a surprisingly low number of iterations of hashing where you will likely see one hash digest from one input match the hash digest of a different input. But they would almost certainly have occurred at different steps of the iteration. And once that occurs, they can never have the same output on the same iteration because the hash algorithm itself is deterministic.
The other approach is to realize that the hash function actually adds entropy while it runs. Consider that an empty string has a 128 bit digest just like any other input; that cannot occur without entropy being added during the algorithm steps. This is actually a necessarily part of a cryptographic hash function: data must be destroyed or else the input could be recovered from the digest. For inputs longer than the digest, yes, entropy is lost on the whole; it has to be in order to fit into the digest length. But some entropy is also added.
I don't have as exact numbers for other hash algorithms, but I think all the points I've made generalize to other hash functions and one-way / mapping functions.
这篇关于哈希上的很多迭代:它不会减少熵吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!