最佳HashSet初始化(Scala | Java)

本文介绍了最佳HashSet初始化(Scala | Java)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在写一封AI解决"生命迷宫"难题.尝试将状态存储到HashSet会使所有操作变慢.没有一组探索状态，运行它会更快.我相当有信心我的节点(状态存储)实现equals和hashCode，并且测试显示HashSet不会添加重复的状态.我可能需要重做hashCode函数，但是我认为减慢它的原因是HashSet重新哈希和调整大小.

I'm writing an A.I. to solve a "Maze of Life" puzzle. Attempting to store states to a HashSet slows everything down. It's faster to run it without a set of explored states. I'm fairly confident my node (state storage) implements equals and hashCode well as tests show a HashSet doesn't add duplicate states. I may need to rework the hashCode function, but I believe what's slowing it down is the HashSet rehashing and resizing.

我尝试将初始容量设置为一个很大的数字，但是仍然非常慢:

I've tried setting the initial capacity to a very large number, but it's still extremely slow:

 val initCapacity = java.lang.Math.pow(initialGrid.width*initialGrid.height,3).intValue()
 val frontier = new QuickQueue[Node](initCapacity)

这是快速队列代码:

class QuickQueue[T](capacity: Int) {

val hashSet = new HashSet[T](capacity)
val queue = new Queue[T]
    //methods below

有关更多信息，这是哈希函数.我将网格值以字节存储在两个数组中，并使用元组访问它:

For more info, here is the hash function. I store the grid values in bytes in two arrays and access it using tuples:

override def hashCode(): Int = {
  var sum = Math.pow(grid.goalCoords._1, grid.goalCoords._2).toInt
  for (y <- 0 until grid.height) {
     for (x <- 0 until grid.width) {
        sum += Math.pow(grid((x, y)).doubleValue(), x.toDouble).toInt
     }
     sum += Math.pow(sum, y).toInt
  }
  return sum
}

关于如何设置不会降低速度的HashSet的任何建议?也许是关于如何记住探索状态的另一个建议?

Any suggestions on how to setup a HashSet that wont slow things down? Maybe another suggestion of how to remember explored states?

P.S.使用java.util.HashSet，即使设置了初始容量，相对于<还是要花费80秒.没有设置7秒

P.S. using java.util.HashSet, and even with initial capacity set, it takes 80 seconds vs < 7 seconds w/o the set

推荐答案

好的，请先替换

override def hashCode(): Int =

使用

override lazy val hashCode: Int =

因此您不必在每次需要访问哈希代码时都计算(grid.height*grid.width)浮点功率.这样可以大大加快速度.

so you don't calculate (grid.height*grid.width) floating point powers every time you need to access the hash code. That should speed things up by an enormous amount.

然后，除非您以某种方式依赖具有紧密哈希码的紧密单元，否则不要重新发明轮子.使用scala.util.hashing.MurmurHash3.seqHash或类似的方法来计算您的哈希值.这将使您的哈希速度提高20倍左右. (仍然保持惰性价.)

Then, unless you somehow rely upon close cells having close hash codes, don't re-invent the wheel. Use scala.util.hashing.MurmurHash3.seqHash or somesuch to calculate your hash. This should speed your hash up by another factor of 20 or so. (Still keep the lazy val.)

这时，您仅需要执行必需的设置操作.现在，除非您有很多0x0网格，否则您将花费大量时间等待math.pow得到结果(并冒着一切变成Double.PositiveInfinity或0.0的风险，具体取决于值会产生哈希冲突，从而进一步降低速度).

Then you only have overhead from the required set operations. Right now, unless you have a lot of 0x0 grids, you are using up the overwhelming majority of your time waiting for math.pow to give you a result (and risking everything becoming Double.PositiveInfinity or 0.0, depending on how big the values are, which will create hash collisions which will slow things down still further).

这篇关于最佳HashSet初始化(Scala | Java)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！