c# - 迭代数千个元素列表

case 15: {
    for (int i = 0; i < words.Count; i++) {
        if (words[i].Length == 8) {
            var tupled = words[i].ConcatCheck();
            for (int n = 0; n < word.Count; n++)
                if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
                    temp++;
        }
        if (temp >= 2)
            matches.Add(words[i]);
        temp = 0;
    }
    break;
}

它能做什么：
第一个“ for循环”遍历约248000个元素长的单词List，并检查长度为8的单词。
找到一个时，我通过调用Tuple方法（我为obj String编写的扩展方法）在单词的前半部分和后半部分（每半4个字母）中创建一个ConcatCheck()。那部分是快速而完善的。

真正需要工作的是第二个“ for循环”。每个单个8个字母的单词都会激活此循环，该循环将遍历更大的List约267000个元素，以检查Tuple的两个项是否都存在。如果都找到，则将原始单词添加到“匹配”列表中。

这部分需要大约3分钟的时间才能找到我所拥有的248k词典中的所有匹配项。有什么方法可以优化/加快速度吗？

最佳答案

如果仅想检查集合中是否存在某个单词，请使用HashSet而不是List或Array。 HashSet类针对Contains检查进行了优化。

例

通过下面的代码，我在不到50毫秒的时间内发现了english dictionary (github version)中所有由两个4个字母组成的8个字母。

WebClient client = new WebClient();
string dictionary = client.DownloadString(
    @"https://raw.githubusercontent.com/dwyl/english-words/master/words.txt");

Stopwatch watch = new Stopwatch();
watch.Start();

HashSet<string> fourLetterWords = new HashSet<string>();

using (StringReader reader = new StringReader(dictionary))
{
    while (true)
    {
        string line = reader.ReadLine();
        if (line == null) break;
        if (line.Length != 4) continue;

        fourLetterWords.Add(line);
    }
}

List<string> matches = new List<string>();

using (StringReader reader = new StringReader(dictionary))
{
    while (true)
    {
        string line = reader.ReadLine();
        if (line == null) break;
        if (line.Length != 8) continue;

        if (fourLetterWords.Contains(line.Substring(0, 4)) &&
            fourLetterWords.Contains(line.Substring(4, 4)))
            matches.Add(line);
    }
}

watch.Stop();

为什么您的代码这么慢？

for (int n = 0; n < word.Count; n++)
    if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
        temp++;

这是罪魁祸首之一。您无需检查Are both parts contained in my array?，而是检查Are 2 or more of my 2 words contained in an array?。

找到两个单词后，您可以通过打破循环来优化此部分。

if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
    if(++temp >= 2) break;

通过按长度或字母顺序对单词进行预排序，可以进行进一步的优化（取决于您执行此搜索的频率）。

关于c# - 迭代数千个元素列表，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/37510883/