c# - 删除单个数据集中相对于C#中另一个数据的重复项

我是C＃的新手。
尝试删除CollectionIn1中的重复项，但不起作用。在CollectionIn中不会删除任何重复项。

为了澄清起见，collectionIn具有[A，B，C，D]，collectionIn2具有[A，B，C]。

所以我想删除collectionIn中的值（A，B，C）

for (int i = 0; i < CollectionIn.Rows.Count; i++) {
    string value1 = CollectionIn.Rows[i].ItemArray[0].ToString().ToLower();

    for (int i2 = 0; i2 < CollectionIn2.Rows.Count; i2++) {
        string value2 = CollectionIn2.Rows[i2].ItemArray[0].ToString().ToLower();

        if (value1 == value2) {
            //Remove value1 when value1 == value2
            CollectionIn.Rows[i].Delete(); <--- Trying to delete when there is duplicate in both collections

            CollectionIn.AcceptChanges();
        }
    }
    //CollectionOut.Rows.Add(value1);
}

我对此链接做了一些更改
http://www.rpaforum.net/threads/how-to-compare-two-excel-sheet-using-c-code-in-blueprism.897/

最佳答案

比较两个集合可能具有O（n2）的复杂度。这不好。如果您具有初始哈希查找，则可以对此进行改进。

var Set1 = new Dictionary<string, int>();

//Prehash all values in the set that won't be deleted from
for (int i = 0; i < CollectionIn.Rows.Count; i++)
{
    string value1 = CollectionIn.Rows[i].ItemArray[0].ToString().ToLower();
    Set1.Add(value1, i);
}

//Loop over the other set
for (int i2 = 0; i2 < CollectionIn2.Rows.Count; i2++)
{
    string value2 = CollectionIn2.Rows[i2].ItemArray[0].ToString().ToLower();

    int foundIndex;
    if (Set1.TryGetValue(value2, out foundIndex) == false)
        continue;

    //Remove value1 when value1 == value2
    CollectionIn.Rows[foundIndex].Delete();
}
CollectionIn.AcceptChanges(); //It's probably best to save changes last as a single call

我对CollectionIn进行了哈希处理，然后迭代了CollectionIn2。这意味着我需要一个字典，以便可以使用CollectionIn索引进行删除。如果将其反转并且对CollectionIn2进行哈希处理，则只需将其作为哈希集，这样会更好，因为它能够处理CollectionIn集中的内部重复项，因此：

var Set2 = new HashSet<string>();

//Prehash all values in one set (ideally the larger set)
for (int i2 = 0; i2 < CollectionIn2.Rows.Count; i2++)
{
    string value2 = CollectionIn2.Rows[i2].ItemArray[0].ToString().ToLower();

    if (Set2.Contains(value2))
        continue; //Duplicate value
    else
        Set2.Add(value2);
}

//Loop over the other set
for (int i1 = 0; i1 < CollectionIn.Rows.Count; i1++)
{
    string value1 = CollectionIn.Rows[i1].ItemArray[0].ToString().ToLower();

    if (Set2.Contains(value1) == false)
        continue;

    //Remove value1 when value1 == value2
    CollectionIn.Rows[i1].Delete();
}

CollectionIn.AcceptChanges(); //It's probably best to save changes last as a single call

此模式将适用于许多数据集类型（包括列表，数组等）。当然，如果您可以为同一数据库上的远程数据集编写SQL，那会更好。

如果您喜欢lambda函数，它应该看起来像这样：

var alreadyInSet2 = new HashSet<string>(CollectionIn2.Rows.Cast<DataRow>()
                    .Select(x => x[0].ToString().ToLower()));

CollectionIn.Rows.Cast<DataRow>()
                    .Where(y => alreadyInSet2.Contains(y[0].ToString().ToLower()) == false)
                    .ToList() //I think you technically need this before calling ForEach
                    .ForEach(y => y.Delete());

CollectionIn.AcceptChanges();

另请参阅：With two very large lists/collections - how to detect and/or remove duplicates efficiently-可以将更多的时间/工作投入到更广泛的答案和性能增强中。

关于c# - 删除单个数据集中相对于C#中另一个数据的重复项，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/51015932/