本文介绍了hashexp 指定的 SAS HashTable 中的表大小到底是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想稍微澄清一下 SAS 哈希表中存储桶的定义.问题正是关于 hashexp 参数.

I would like to have a little clarification on the definiton of a bucket in SAS hashtable. The question is exactly about the hashexp parameter.

根据 SAS DOC,hashexp 是:

According to the SAS DOCs, hashexp is:

哈希对象的内部表大小,其中哈希表的大小为2n.

HASHEXP 的值用作二次幂指数来创建哈希表大小.例如,HASHEXP 的值为 4 等于哈希表大小为 24 或 16.HASHEXP 的最大值为 20.

哈希表大小不等于可以存储的项目数.将哈希表想象成一个桶"数组.大小为 16 的哈希表将有 16 个桶".每个桶可以容纳无限数量的项目.哈希表的效率在于哈希函数能够将项目映射到桶并从桶中检索项目.

您应该根据哈希对象中的数据量来设置哈希表大小,以最大限度地提高哈希对象查找例程的效率.尝试不同的 HASHEXP 值,直到获得最佳结果.例如,如果哈希对象包含一百万个项目,那么大小为 16 (HASHEXP = 4) 的哈希表可以工作,但效率不高.大小为 512 或 1024(HASHEXP = 9 或 10)的哈希表会产生最佳性能.

问题是究竟什么是哈希表大小,而不是哈希对象中的数据量?

The question is what exactly is a hash table size, while it is not a amount of data in the hash object?

是否应该将其理解为我们希望分配尽可能多的内存,但不能少,不能更多.让事情快速运转是二的力量.但它并没有限制可能使用的数据量,它只是表明将要使用多少,对吧?

Should it be understood as if we wanted to allocate as much memory as it may be neccessary but not less, no more. It is a power of two to get things work fast. But it does not limit the amount of data possibly used, it only indicates about how much is going to be used, right?

推荐答案

Paul Dorfman(哈希大师)在本白皮书的第 10 页详细介绍了:

Paul Dorfman (the master of hashing) goes into a fair bit of detail on page 10 of this whitepaper:

http://www2.sas.com/proceedings/forum2008/037-2008.pdf

据我了解,哈希表将其数据存储在二叉树中.hashexp 创建的每个桶代表将用于存储数据的二叉树的数量.hashexp 为 0 将使用单个树,而 hashexp 为 8 将使用 256 棵树.当对散列对象执行查找时,内部算法会确定键应该存在于哪棵树中(基于散列值).然后它检查该树的值.通过自动知道要查看 256 棵树中的哪棵树(例如),与单个二叉树相比,它可以节省 8 次比较 (2^8).

As I understand it, hashtables store their data in binary trees. Each bucket created by hashexp represents the number of binary trees that will be used to store the data. A hashexp of 0 would use a single tree, while a hashexp of 8 would use 256 trees. When a lookup is performed against the hash object, an internal algorithm determines which tree the key should exist in (based on the hashed value). It then checks that tree for the value. By automatically knowing which of the 256 trees to look in (for example) it would have saved itself 8 comparisons (2^8) when compared to a single binary tree.

整个事情似乎比这复杂得多,但这就是我对为什么它运行得更快的解释.

The whole thing seems a lot more complex than that but that's my interpretation of why it works out faster.

这篇关于hashexp 指定的 SAS HashTable 中的表大小到底是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!