本文介绍了从大整数以可逆方式生成伪自然短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大而唯一的整数(实际上是SHA1哈希)。

I have a large and "unique" integer (actually a SHA1 hash).

注意: SHA1哈希,这不是加密/安全性问题!我不是试图破坏SHA1。想象一下,如果有一个随机的160位整数而不是SHA1,那会有所帮助。

Note: While I'm talking here about SHA1 hashes, this is not a cryptography / security question! I'm not trying to break SHA1. Imagine a random 160-bit integer instead of SHA1 if that will help.

我想要(除了玩乐之外,没有其他原因)找到一种映射该SHA1的算法散列到计算机生成的(伪)英语短语。映射应该是双向的(即,知道算法,必须能够从该短语中计算出原始的SHA1哈希。)

I want (for no other reason than to have fun) to find an algorithm to map that SHA1 hash to a computer-generated (pseudo-)English phrase. The mapping should be bidirectional (i.e., knowing the algorithm, one must be able to calculate the original SHA1 hash from that phrase.)

该短语没有任何意义。我什至会废话整段。 (尽管段落的质量(英文)可能比仅使用一个短语要好。)

The phrase need not make sense. I would even settle for a whole paragraph of nonsense. (Though quality — englishness — of a paragraph should probably be better than for a mere phrase.)

更好的算法会产生更短,更自然,更独特的短语。

A better algorithm would produce shorter, more natural-looking, more unique phrases.

一种变体:如果我只能使用一部分哈希,那是可以的。说,前六个十六进制数字就可以了。

A variation: it is OK if I will be able to work only with a part of hash. Say, first six hex digits is fine.

生成的短语的可能用法:Git commit ID的人类可读版本,用作给定程序的座右铭版本,从该提交构建。 (正如我说的那样,这是很有趣。我并没有声称这是非常实用的,或者比SHA1本身更具可读性。)

The possible usage of the generated phrase: the human readable version of Git commit ID, to use as a motto for a given program version, which is built from that commit. (As I said, this is "for fun". I don't claim that this is very practical — or be much more readable than the SHA1 itself.)

可能的方法:过去,我尝试根据我从SHA读取的位来构建一个概率表(单词),并生成诸如马尔可夫链的短语,为生成器播种(从概率树中选择分支)。这不是很成功,结果短语太长且太丑。我不确定这是错误还是算法中的一般缺陷,因为我必须足够早地放弃它。

现在,我我正在考虑再次尝试解决问题。有关如何处理此问题的任何建议? 您认为马尔可夫链方法可以在这里工作吗?

Now I'm thinking about attempting to solve the problem once again. Any advice on how to approach this? Do you think Markov chain approach can work here? Something else?

推荐答案

一种非常简单的方法是:
拿一个1024个名词的列表,每个动词1024个,形容词1024个。然后,您的短语可以是以下形式的句子

A very simple approach would be: Take list of say 1024 nouns, 1024 verbs and 1024 adjectives each. Your phrase could then be sentence of the form

noun[bits_01-10] verb[bits11-20] adjective[bits21-30] verb[bits31-40],
noun[bits_41-50] verb[bits51-60] adjective[bits61-70] verb[bits71-80],
noun[bits_81-90] verb[bits91-100] adjective[bits101-110] verb[bits111-120] and 
noun[bits_121-130] verb[bits131-140] adjective[bits141-150] verb[bits151-160].

多一点语言上的思想,您就可以制作出稍微复杂一点的广告,从而使句子看起来不太重复(例如,单数/复数有点,不同时态的有点两位,...)。较长的单词列表会占用更多的位,但是我的猜测是,您很快就会到达相当奇特的单词。

With a bit more linguistic thought you can probably construct slightly more complicated ad thus not so repetitive looking sentences (say, a bit for singular / plural, a bit of two for different tenses,...). Longer word lists use up a few more bits but my guess is that you reach rather exotic words quite fast.

这篇关于从大整数以可逆方式生成伪自然短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-20 15:34