从对象列表中删除重复项

从对象列表中删除重复项

本文介绍了从对象列表中删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 MyObject 字段:id、a、b、c、e、f并且我有包含 500 000 个项目的列表,现在如何删除所有具有相同参数 a、c、f 值的重复项目?

I have MyObject with field: id, a, b, c, e, fand I have List with 500 000 items, now how can I remove all duplicate items with of the same value of the parameter a, c, f?

我只寻找最快和最有效的方法.

I am looking for only the fastest and most efficient method.

更新
我实现了比较器

UPDATE
I implemented comparator

我班上的字段是不同类型的,所以我使用 ToString().这是好方法吗?
IdLocIdMetSer 长吗?
是对象
IdDataType 很长

Fields in my class are of different types so I use ToString(). It is good way?
IdLoc, IdMet, Ser are long?
Value is Object
IdDataType is long

class Comparer : IEqualityComparer<MyObject>
{
    public bool Equals(MyObject x, MyObject y)
    {
        return x.IdLoc == y.IdLoc && x.IdMet == y.IdMet && x.Ser == y.Ser &&
               x.IdDataType == y.IdDataType && x.Time == y.Time && x.Value == y.Value;
    }

    public int GetHashCode(MyObject obj)
    {
        string idLoc = obj.IdLoc.HasValue ? obj.IdLoc.ToString() : String.Empty;
        string idMet = obj.IdMet.HasValue ? obj.IdMet.ToString() : String.Empty;
        string ser = obj.Ser.HasValue ? obj.Ser.ToString() : String.Empty;
        string value = obj.Value != null ?  obj.Value.ToString() : String.Empty;

        return (idLoc + idMet + ser + value + obj.IdDataType.ToString() + obj.Time.ToString()).GetHashCode();
    }
}

删除重复项
元素 566 890
1) 时间:2秒

Removing duplicates
Elements 566 890
1) Time: 2 sec

DateTime start = DateTime.Now;
List<MyObject> removed = retValTmp.Distinct(new Comparer()).ToList();
double sec = Math.Round((DateTime.Now - start).TotalSeconds, 3);

2) 时间:1.5 秒

2) Time: 1.5 sec

start = DateTime.Now;
List<MyObject> retList = new List<MyObject>();
HashSet<MyObject> removed2 = new HashSet<MyObject>(new Comparer());
foreach (var item in retValTmp)
{
    if (!removed2.Contains(item))
    {
        removed2.Add(item);
        retList.Add(item);
    }
}
double sec2 = Math.Round((DateTime.Now - start).TotalSeconds, 3);

4) 我也试过这种方式:

4) Also I tried out this way:

start = DateTime.Now;

var removed3 = retValTmp.Select(myObj => new { myObj.IdLoc, myObj.IdMet, myObj.Ser, myObj.Value, myObj.IdDataType, myObj.Time }).Distinct().ToList();

double sec3 = Math.Round((DateTime.Now - start).TotalSeconds, 3);

时间:0.35秒
但是返回的list不在我的类中,为什么1和2的list和3的list的元素个数不一样?

Time: 0.35 sec
but returned list is not in my class, and why the number of elements in the list of 1 and 2 is different than the list of 3?

更新2

public int GetDataHashCode(MyObject obj)
{
    long idLoc = obj.IdLoc.HasValue ? obj.IdLoc.Value : 0;
    long idMet = obj.IdMet.HasValue ? obj.IdMet.Value : 0;
    long ser = obj.SerHasValue ? obj.Ser.Value : 0;
    int valueHash = 0;
    if (obj.Value != null)
        valueHash = obj.Value.GetHashCode();
    else
        valueHash = valueHash.GetHashCode();

    return (idLoc.GetHashCode() + idMet.GetHashCode() + ser.GetHashCode() + valueHash  + obj.IdDataType.GetHashCode() + obj.Time.GetHashCode()).GetHashCode();
}

使用:

foreach (MyObject daItem in retValTmp)
{
    int key = GetDataHashCode(daItem);
    if (!clearDict.ContainsKey(key))
        clearDict.Add(key, daItem);
}

元素:750 000
时间:0.23 秒!

Element: 750 000
Time: 0.23 sec!

推荐答案

如果您正在寻找的是速度,并且不介意使用一些内存,那么我建议您使用 HashSet,如果你有兴趣做一些自定义比较,那么你可以做一个IEqualityComparer,像这样:

If what you are looking for is speed, and don't mind using up some memory then I would recommend that you use a HashSet, if you are interested in doing some custom comparison, then you can make an IEqualityComparer<T>, something like this:

var original = new ArrayList(); // whatever your original collection is
var unique = new HasSet<YourClass>(new MyCustomEqualityComparer());

foreach(var item in original)
{
    if(!unique.Contains(item))
        unique.Add(item);
}

return unique;

这里的问题是你最终可能会吞食两倍的原始内存.

the issue here is that you may end up gobbling up twice the original memory.

我做了一些额外的研究,我认为您可以通过简单的操作来实现您想要的:

I made some extra research and I think you can achieve just what you want by simply doing:

var original // your original data
var unique = new HashSet<YourClass>(origin, new CustomEqualityComparer());

应该注意删除重复数据,因为 HashSet 中不允许重复.我建议你也看看 这个问题关于GetHasCode实施指南.

that should take care of removing duplicated data as no duplication is allowed in a HashSet. I'd recommend that you also take a look at this question about GetHasCode implementation guidelines.

如果您想了解有关 HashSet 类的更多信息,请访问以下链接:

If you want to know some more about the HashSet class follow these links:

关于HashSet
关于IEqualityComparer构造函数
IEqualityComparer 文档

希望能帮到你

这篇关于从对象列表中删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 19:36