本文介绍了为什么 HashSet&lt;Point&gt;比 HashSet<string> 慢多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在不允许重复的情况下存储一些像素位置,所以首先想到的是 HashSet 或类似的类.然而,与 HashSet 之类的东西相比,这似乎非常慢.

I wanted to store some pixels locations without allowing duplicates, so the first thing comes to mind is HashSet<Point> or similar classes. However this seems to be very slow compared to something like HashSet<string>.

例如这段代码:

HashSet<Point> points = new HashSet<Point>();
using (Bitmap img = new Bitmap(1000, 1000))
{
    for (int x = 0; x < img.Width; x++)
    {
        for (int y = 0; y < img.Height; y++)
        {
            points.Add(new Point(x, y));
        }
    }
}

大约需要 22.5 秒.

takes about 22.5 seconds.

虽然下面的代码(由于显而易见的原因这不是一个好的选择)只需要 1.6 秒:

While the following code (which is not a good choice for obvious reasons) takes only 1.6 seconds:

HashSet<string> points = new HashSet<string>();
using (Bitmap img = new Bitmap(1000, 1000))
{
    for (int x = 0; x < img.Width; x++)
    {
        for (int y = 0; y < img.Height; y++)
        {
            points.Add(x + "," + y);
        }
    }
}

所以,我的问题是:

  • 有什么原因吗?我检查了这个答案,但 22.5 秒比那个答案中显示的数字多得多.
  • 有没有更好的方法来存储点而不重复?
  • Is there a reason for that? I checked this answer, but 22.5 sec is way more than the numbers shown in that answer.
  • Is there a better way to store points without duplicates?

推荐答案

Point 结构导致了两个性能问题.当您将 Console.WriteLine(GC.CollectionCount(0)); 添加到测试代码时,您可以看到一些内容.您会看到 Point 测试需要 ~3720 个集合,而 string 测试只需要 ~18 个集合.不是免费的.当你看到一个值类型引发了如此多的集合时,你需要得出结论呃-哦,拳击太多了".

There are two perf problems induced by the Point struct. Something you can see when you add Console.WriteLine(GC.CollectionCount(0)); to the test code. You'll see that the Point test requires ~3720 collections but the string test only needs ~18 collections. Not for free. When you see a value type induce so many collections then you need to conclude "uh-oh, too much boxing".

问题在于 HashSet 需要一个 IEqualityComparer 来完成它的工作.由于您没有提供一个,它需要回退到 EqualityComparer.Default() 返回的一个.该方法可以很好地处理字符串,它实现了IEquatable.但不适用于 Point,它是一种源自 .NET 1.0 且从未得到泛型喜爱的类型.它所能做的就是使用 Object 方法.

At issue is that HashSet<T> needs an IEqualityComparer<T> to get its job done. Since you did not provide one, it needs to fall back to one returned by EqualityComparer.Default<T>(). That method can do a good job for string, it implements IEquatable. But not for Point, it is a type that harks from .NET 1.0 and never got the generics love. All it can do is use the Object methods.

另一个问题是 Point.GetHashCode() 在这个测试中表现不佳,碰撞太多,所以它对 Object.Equals() 的影响很大.String 具有出色的 GetHashCode 实现.

The other issue is that Point.GetHashCode() does not do a stellar job in this test, too many collisions, so it hammers Object.Equals() pretty heavily. String has an excellent GetHashCode implementation.

您可以通过为 HashSet 提供一个好的比较器来解决这两个问题.喜欢这个:

You can solve both problems by providing the HashSet with a good comparer. Like this one:

class PointComparer : IEqualityComparer<Point> {
    public bool Equals(Point x, Point y) {
        return x.X == y.X && x.Y == y.Y;
    }

    public int GetHashCode(Point obj) {
        // Perfect hash for practical bitmaps, their width/height is never >= 65536
        return (obj.Y << 16) ^ obj.X;
    }
}

并使用它:

HashSet<Point> list = new HashSet<Point>(new PointComparer());

现在速度提高了大约 150 倍,轻松击败了字符串测试.

And it is now about 150 times faster, easily beating the string test.

这篇关于为什么 HashSet&lt;Point&gt;比 HashSet<string> 慢多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 02:18