问题描述
我正在关于堆栈重复的上一篇帖子关于删除重复项来自C#中的 List< T>
.
I am following a previous post on stackoverflow about removing duplicates from a List<T>
in C#.
如果< T>
是某些用户定义的类型,例如:
If <T>
is some user defined type like:
class Contact
{
public string firstname;
public string lastname;
public string phonenum;
}
建议的(HashMap)不会删除重复项.我想,我必须重新定义一些用于比较两个对象的方法,不是吗?
The suggested (HashMap) doesn't remove duplicate. I think, I have to redefine some method for comparing two objects, isn't it?
推荐答案
一个 HashSet< T>
可以删除重复项,因为它是一个集合...但仅当您的类型适当地定义了相等性.
A HashSet<T>
does remove duplicates, because it's a set... but only when your type defines equality appropriately.
我怀疑重复"的意思是一个对象的字段值与另一个对象的字段值相等"-您需要重写 Equals
/ GetHashCode
才能正常工作,并且/或实施 IEquatable< Contact>
...或可以向 HashSet< T>
构造函数提供 IEqualityComparer< Contact>
.
I suspect by "duplicate" you mean "an object with equal field values to another object" - you need to override Equals
/GetHashCode
for that to work, and/or implement IEquatable<Contact>
... or you could provide an IEqualityComparer<Contact>
to the HashSet<T>
constructor.
您可以可以调用 Distinct
LINQ扩展方法,而不是使用 HashSet< T>
.例如:
Instead of using a HashSet<T>
you could just call the Distinct
LINQ extension method. For example:
list = list.Distinct().ToList();
但是同样,您需要以某种方式提供适当的相等性定义.
But again, you'll need to provide an appropriate definition of equality, somehow or other.
这是一个示例实现.请注意,我是如何使其变得不可变的(可变类型的相等是奇怪的,因为两个对象在一分钟内可以相等,而下一分钟可以不相等),并且制成具有公共属性的私有字段.最后,我已经密封了类-不可变类型通常应该被密封,这使得相等性更易于讨论.
Here's a sample implementation. Note how I've made it immutable (equality is odd with mutable types, because two objects can be equal one minute and non-equal the next) andmadethe fields private, with public properties. Finally, I've sealed the class - immutable types should generally be sealed, and it makes equality easier to talk about.
using System;
using System.Collections.Generic;
public sealed class Contact : IEquatable<Contact>
{
private readonly string firstName;
public string FirstName { get { return firstName; } }
private readonly string lastName;
public string LastName { get { return lastName; } }
private readonly string phoneNumber;
public string PhoneNumber { get { return phoneNumber; } }
public Contact(string firstName, string lastName, string phoneNumber)
{
this.firstName = firstName;
this.lastName = lastName;
this.phoneNumber = phoneNumber;
}
public override bool Equals(object other)
{
return Equals(other as Contact);
}
public bool Equals(Contact other)
{
if (object.ReferenceEquals(other, null))
{
return false;
}
if (object.ReferenceEquals(other, this))
{
return true;
}
return FirstName == other.FirstName &&
LastName == other.LastName &&
PhoneNumber == other.PhoneNumber;
}
public override int GetHashCode()
{
// Note: *not* StringComparer; EqualityComparer<T>
// copes with null; StringComparer doesn't.
var comparer = EqualityComparer<string>.Default;
// Unchecked to allow overflow, which is fine
unchecked
{
int hash = 17;
hash = hash * 31 + comparer.GetHashCode(FirstName);
hash = hash * 31 + comparer.GetHashCode(LastName);
hash = hash * 31 + comparer.GetHashCode(PhoneNumber);
return hash;
}
}
}
好的,响应对 GetHashCode()
实现的解释的请求:
Okay, in response to requests for an explanation of the GetHashCode()
implementation:
- 我们要结合此对象属性的哈希码
- 我们不在任何地方检查null,因此我们应该假设其中一些可能为null.
EqualityComparer< T>.默认
始终会处理此问题,这很好...因此,我正在使用它来获取每个字段的哈希码. - 乔什·布洛赫(Josh Bloch)推荐的一种将多个哈希码组合为一个的加法和乘法"方法.还有许多其他通用的哈希算法,但是这种算法对大多数应用程序都适用.
- 我不知道默认情况下您是否在检查环境中进行编译,因此我已将计算放在非检查环境中.我们真的不在乎重复的乘法/加法是否会导致溢出,因为我们不是在寻找这样的量级"……只是一个我们可以反复达到相等的数值对象.
- We want to combine the hash codes of the properties of this object
- We're not checking for nullity anywhere, so we should assume that some of them may be null.
EqualityComparer<T>.Default
always handles this, which is nice... so I'm using that to get a hash code of each field. - The "add and multiply" approach to combining several hash codes into one is the standard one recommended by Josh Bloch. There are plenty of other general-purpose hashing algorithms, but this one works fine for most applications.
- I don't know whether you're compiling in a checked context by default, so I've put the computation in an unchecked context. We really don't care if the repeated multiply/add leads to an overflow, because we're not looking for a "magnitude" as such... just a number that we can reach repeatedly for equal objects.
另外两种处理无效性的方式:
Two alternative ways of handling nullity, by the way:
public override int GetHashCode()
{
// Unchecked to allow overflow, which is fine
unchecked
{
int hash = 17;
hash = hash * 31 + (FirstName ?? "").GetHashCode();
hash = hash * 31 + (LastName ?? "").GetHashCode();
hash = hash * 31 + (PhoneNumber ?? "").GetHashCode();
return hash;
}
}
或
public override int GetHashCode()
{
// Unchecked to allow overflow, which is fine
unchecked
{
int hash = 17;
hash = hash * 31 + (FirstName == null ? 0 : FirstName.GetHashCode());
hash = hash * 31 + (LastName == null ? 0 : LastName.GetHashCode());
hash = hash * 31 + (PhoneNumber == null ? 0 : PhoneNumber.GetHashCode());
return hash;
}
}
这篇关于如何从List< T>中删除重复项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!