问题描述
我正在编写一个脚本,它应该合并来自基于 sql 的数据库的一些数据.每行都有一个长整数作为主键(增量).我正在考虑对这些 id 进行哈希处理,以便它们以某种方式看起来"像我的 RethinkDB 表中已有的其他 id.我在这里试图实现的是在尝试再次合并相同数据时避免重复,但是将原始整数作为 id 以及直接保存到 RethinkDB 表的数据的生成 id 保留感觉很奇怪.
I'm writing a script which supposed to merge some data from sql-based db. Each row has a long-integer as a primary key (incremental). I was thinking about hashing these ids so that they'll somehow 'look' like the other ids already in my RethinkDB table. What I'm trying to achive here is to avoid dups in case of an attempt to merge the same data again, but keeping the original integers as ids along with the generated ids of the data saved directly to RethinkDB's table feels weird.
我可以这样做吗?RethinkDB 是如何生成自动 ID 的?我是否正确地处理了这个问题..?
Can I do that?How does RethinkDB generate auto ids anyways?And am I approaching this correctly..?
推荐答案
RethinkDB 使用 128 位 UUID(基本上是散列整数)的字符串编码.
RethinkDB uses a string-encoding of 128 bit UUIDs (basically hashed integers).
字符串格式如下所示:HHHHHHHH-HHHH-HHHH-HHHH-HHHHHHHHHHHH",其中每个H"都是 128 位整数的十六进制数字.使用字符 0-9 和 a-f(小写).
The string format looks like this: "HHHHHHHH-HHHH-HHHH-HHHH-HHHHHHHHHHHH" where every 'H' is a hexadecimal digit of the 128 bit integer. The characters 0-9 and a-f (lower case) are used.
如果您想从现有整数生成此类 UUID,我建议先对整数进行散列.这将为您提供整个密钥空间的均匀分布(这使分片更容易并避免热点).第二步,您必须将哈希值格式化为上述格式的字符串.如果您没有足够的数字,可以将最后的 'H' 中的一些保留为常量 0.
If you want to generate such UUIDs from an existing integer, I recommend hashing the integer first. This will give you an even distribution over the whole key space (this makes sharding easier and avoids hotspots).As a second step you have to format the hash value in a string of the format shown above. If you don't have enough digits, it's fine to leave some of the last 'H' as constant 0.
如果你真的想深入了解 UUID 生成的细节,这里有两个链接供进一步阅读:1. RFC 4122通用唯一标识符 (UUID) URN 命名空间"http://tools.ietf.org/html/rfc41222. RethinkDB 对 UUID 生成和格式化的实现 https://github.com/rethinkdb/rethinkdb/blob/next/src/containers/uuid.cc
If you really want to go into the details of UUID generation, here are two links for further reading: 1. RFC 4122 "A Universally Unique IDentifier (UUID) URN Namespace" http://tools.ietf.org/html/rfc4122 2. RethinkDB's implementation of UUID generation and formatting https://github.com/rethinkdb/rethinkdb/blob/next/src/containers/uuid.cc
这篇关于RethinkDB 如何生成自动 ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!