问题描述
我要寻找的最终结果是实现 T-SQL CHECKSUM 在BigQuery中使用 JavaScript UDF .我会愿意让C/C ++源代码进行翻译,但是如果有人已经完成了这项工作,那么我很乐意使用它.
The end result I'm looking for is to implement T-SQL CHECKSUM in BigQuery with a JavaScript UDF. I would settle for having the C/C++ source code to translate but if someone has already done this work then I'd love to use it.
或者,如果有人可以想到一种在Microsoft SQL Server中存储的字符串与BigQuery中的字符串之间创建等效的哈希码的方法,那么这对我也有帮助.
Alternatively, if someone can think of a way to create an equivalent hash code between strings stored in Microsoft SQL Server compared to those in BigQuery then that would help me too.
- 更新:我已经通过注释中的HABO链接找到了一些源代码,这些注释以T-SQL编写以执行相同的CHECKSUM,但我很难将其转换为JavaScript,而JavaScript本质上无法处理64位整数.我正在处理一些小示例,发现该算法仅对 个字节的低字节有效.
- 更新2:我对复制此算法感到非常好奇,我可以看到一些确定的模式,但我的大脑无法完成将其提炼成反向工程解决方案的任务.我确实发现
BINARY_CHECKSUM()
和CHECKSUM()
返回的内容不同,因此前者所做的工作对后者没有帮助.
- UPDATE: I've found some source code through HABO's link in the comments which is written in T-SQL to perform the same CHECKSUM but I'm having difficulty converting it to JavaScript which inherently cannot handle 64bit integers. I'm playing with some small examples and have found that the algorithm works on the low nibble of each byte only.
- UPDATE 2: I got really curious about replicating this algorithm and I can see some definite patterns but my brain isn't up to the task of distilling that into a reverse engineered solution. I did find that
BINARY_CHECKSUM()
andCHECKSUM()
return different things so the work done on the former didn't help me with the latter.
推荐答案
我花了一天的时间进行逆向工程,首先转储单个ASCII字符以及成对的所有结果.这表明每个字符都有其自己独特的"XOR代码",而字母无论大小写都相同.之后,该算法非常简单:将左移4位,并通过查找表中存储的代码进行异或运算.
I spent the day reverse engineering this by first dumping all results for single ASCII characters as well as pairs. This showed that each character has its own distinct "XOR code" and letters have the same one regardless of case. The algorithm was remarkably simple to figure out after that: rotate 4 bits left and xor by the code stored in a lookup table.
var xorcodes = [
0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31,
0, 33, 34, 35, 36, 37, 38, 39, // !"#$%&'
40, 41, 42, 43, 44, 45, 46, 47, // ()*+,-./
132, 133, 134, 135, 136, 137, 138, 139, // 01234567
140, 141, 48, 49, 50, 51, 52, 53, 54, // 89:;<=>?@
142, 143, 144, 145, 146, 147, 148, 149, // ABCDEFGH
150, 151, 152, 153, 154, 155, 156, 157, // IJKLMNOP
158, 159, 160, 161, 162, 163, 164, 165, // QRSTUVWX
166, 167, 55, 56, 57, 58, 59, 60, // YZ[\]^_`
142, 143, 144, 145, 146, 147, 148, 149, // abcdefgh
150, 151, 152, 153, 154, 155, 156, 157, // ijklmnop
158, 159, 160, 161, 162, 163, 164, 165, // qrstuvwx
166, 167, 61, 62, 63, 64, 65, 66, // yz{|}~
];
function rol(x, n) {
// simulate a rotate shift left (>>> preserves the sign bit)
return (x<<n) | (x>>>(32-n));
}
function checksum(s) {
var checksum = 0;
for (var i = 0; i < s.length; i++) {
checksum = rol(checksum, 4);
var c = s.charCodeAt(i);
var xorcode = 0;
if (c < xorcodes.length) {
xorcode = xorcodes[c];
}
checksum ^= xorcode;
}
return checksum;
};
请参见 https://github.com/neilodonuts/tsql-checksum-javascript了解更多信息.
免责声明:我只使用排序规则设置为SQL_Latin1_General_CP1_CI_AS
的SQL Server中与VARCHAR
字符串的兼容性.这不适用于多列或整数,但是我确定基础算法使用相同的代码,因此不难发现.由于归类,它似乎也与dbfiddle有所不同: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ...里程可能有所不同!
DISCLAIMER: I've only worked on compatibility with VARCHAR
strings in SQL Server with collation set to SQL_Latin1_General_CP1_CI_AS
. This won't work with multiple columns or integers but I'm sure the underlying algorithm uses the same codes so it wouldn't be hard to figure out. It also seems to differ from db<>fiddle possibly due to collation: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ... mileage may vary!
这篇关于如何在JavaScript中为BigQuery实现T-SQL CHECKSUM()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!