c - Jenkins哈希在C中，大小不为4的倍数的键和地址清理器

在我目前正在做的项目中（在C中），我们目前
保留一些不透明对象的哈希表。我们使用DPDK
应用程序中的I/O（不幸的是，版本16.07.2），我们使用
用于散列对象的rte U哈希代码。问题是，我们想要的东西
hash有奇怪的、不圆形的大小，比如说83（或者18，如
下面的例子），并且地址消毒剂抱怨
堆缓冲区溢出（读取时）-尝试在结束后读取字节
地区：

==4926==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300007a9c0 at pc 0x000000451573 bp 0x7fff69175040 sp 0x7fff69175030
READ of size 4 at 0x60300007a9c0 thread T10ESC[1mESC[0m
#0 0x451572 in __rte_jhash_2hashes /path/to/../dpdk/usr/include/dpdk/rte_jhash.h:155
#1 0x452bb6 in rte_jhash_2hashes /path/to/../dpdk/usr/include/dpdk/rte_jhash.h:266
#2 0x452c75 in rte_jhash /path/to/../dpdk/usr/include/dpdk/rte_jhash.h:309

0x60300007a9c2 is located 0 bytes to the right of 18-byte region [0x60300007a9b0,0x603

00007a9c2）
据我所知，问题出在rte_jhash.h（参见此处
在最新的DPDK代码中，据我所知它没有变化：
http://dpdk.org/doc/api/rte__jhash_8h_source.html）：

    case 6:
        b += k[1] & LOWER16b_MASK; a += k[0]; break;

代码将k[1]读取为uint32，然后对该值进行求值，以便
最后2个字节被丢弃。据我所知，地址消毒剂
当只有前两个字节是
实际上标记为可读的。这是有道理的，但是
吹嘘它可以使用任何大小的钥匙。所以我的问题是-这是
只是理论问题？或者可能导致
这个，可能是一个奇怪大小的物体，刚好在
一页纸？我们在x86-64上运行。
几个月前，DPDK的一个改动在评论中增加了一些内容（参见http://dpdk.org/browse/dpdk/commit/lib/librte_hash?id=0c57f40e66c8c29c6c92a7b0dec46fcef5584941），但如果可能发生崩溃，我本以为措辞会更加严厉。
更新：复制警告的示例代码。编译时使用：

gcc -o jhash_malloc  -Wall -g -fsanitize=address -I /path/to/dpdk/x86_64-native-linuxapp-gcc/include/ jhash_malloc.c

以及代码：

#include <stdio.h>
#include <rte_jhash.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
    size_t strSize = 13;
    char *str = malloc(strSize);
    memset(str, 'g', strSize);
    uint32_t hval = rte_jhash(str, strSize, 0);
    printf("Hash of %s (size %zu) is %u\n", str, strSize, hval);

    free(str);
    return 0;
}

UPDATE2:和输出：

==27276==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000effc at pc 0x000000401315 bp 0x7ffdea936f80 sp 0x7ffdea936f70
READ of size 4 at 0x60200000effc thread T0
#0 0x401314 in __rte_jhash_2hashes /home/stefan/src/dpdk-17.08/x86_64-native-linuxapp-gcc/include/rte_jhash.h:165
#1 0x402771 in rte_jhash_2hashes /home/stefan/src/dpdk-17.08/x86_64-native-linuxapp-gcc/include/rte_jhash.h:266
#2 0x402830 in rte_jhash /home/stefan/src/dpdk-17.08/x86_64-native-linuxapp-gcc/include/rte_jhash.h:309
#3 0x4028e7 in main /home/stefan/src/test/misc/jhash_malloc.c:12
#4 0x7f470cb1f82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#5 0x400978 in _start (/home/stefan/src/test/misc/jhash_malloc+0x400978)

0x60200000effd is located 0 bytes to the right of 13-byte region [0x60200000eff0,0x60200000effd)

更新3：原始的Jenkins散列代码似乎是这样的：http://burtleburtle.net/bob/c/lookup3.c。消息来源中有一条有趣的评论，认为可以忽略asan/valgrind警告：

 * "k[2]&0xffffff" actually reads beyond the end of the string, but
 * then masks off the part it's not allowed to read.  Because the
 * string is aligned, the masked-off tail is in the same word as the
 * rest of the string.  Every machine with memory protection I've seen
 * does it on word boundaries, so is OK with this.  But VALGRIND will
 * still catch it and complain.  The masking trick does make the hash
 * noticably faster for short strings (like English words).

当然，如果你想散列一个更大的malloc ed对象的部分，你可能仍然会遇到麻烦。

最佳答案

你说得对，如果你传递给rte_jhash()的密钥恰好在页面的末尾，如果下一页不可读，应用程序将崩溃。您所指的承诺基本上是修复它，但在文档中，而不是在代码中。
解决办法是：
确保代码中的所有键都对齐并填充到4个字节；（另请参见下面的注释）
或者将代码中的密钥长度定为4的倍数；
或者在项目中复制粘贴rte_jhash()并修复它，然后将修复发送到DPDK邮件列表。
注1:C中的结构通常已经对齐并填充到结构的最大原始数据类型。因此，除非结构已打包，否则这种显式填充不应导致任何性能/内存问题。
注2：如果密钥由DPDK库管理（即使用DPDK杜鹃哈希库），则密钥的存储将在内部对齐和填充，因此无需担心。
总的来说，如果您的密钥是外部管理的（即，由另一个进程管理，或者您从网络接收密钥等），那么这可能是一个真正的问题。否则，有很简单的方法来修复这些。。。

关于c - Jenkins哈希在C中，大小不为4的倍数的键和地址清理器，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/48154716/