c - 如何在C中填充Trie？

我正在尝试编写一个使用单词并创建一个Trie的程序，该Trie的每个节点都是一个包含一个字符的结构。

我有一个将char *解析为单词的函数（假设char *仅包含小写字母）。当每个单词都从char *中获取时，将传递给函数addWordOccurrence(const char* word, const int wordLength, struct tNode root)。当我在循环中递增检查每个addWordOccurrence()可能的索引（对于所有小写字母为0-25）时，root.branches[i]应该检查单词的第一个字母是否在root.branches中。如果第一个字母不在root.branches中，则将创建一个包含新字母的新结构tNode。然后继续到单词的第二个字母，将其与新制作的struct tNode的分支进行比较，依此类推...

我们尝试的第一个单词是“ doctor”，我的特里使用第一个字母“ d”并将其添加到root.branches[0]中，然后使用“ o”并将其添加到root.branches[0].branches[0]中，这是正确的。但是随后，它将“ d”医生添加到其分支的下一个17个索引中（即root.branches[0].branches[1] through [18]），这种情况不应该发生。请帮忙！

struct tNode{
  char c;
  int occurrences;
  struct tNode *branches;
};

int addWordOccurrence(const char* word, const int wordLength, struct tNode root){
//declare fields
int counter, i,k,firstNull;
counter = 0;
while(1){
  if(counter >= wordLength){
    break;
  }
  //traverse through the word letter by letter
  for(i=0; i<wordLength; i++){
    //compare each letter to the branches of root until the letter is found or first null space
    for(k=0; k<26; k++){
    //if the letter is a branch already set root to the struct of that letter in branches
       if(root.branches[k].c == word[i]){
          root = root.branches[k];
          break;
       }
    }
    //the current letter of the word is not in branches
    //go through branches to find position to add the new tNode
    for(firstNull=0; firstNull<26; firstNull++){
       //set firstNull equal to the index of the first null value in branches
       if(root.branches[firstNull].c  <  'a' || root.branches[firstNull].c > 'z' ){
          break;
       }
    }
    //add a new node to branches
    root.branches[firstNull].c = word[i];
    root.branches[firstNull].occurrences = 0;
    root.branches[firstNull].branches = malloc(sizeof(struct tNode) * 26);
    if(counter != wordLength){
       root = root.branches[firstNull];
    }
    counter++;
    if(counter == wordLength-2){
       root.occurrences++;
    }
 }
}
 return 0;
}

最佳答案

您的实现存在很多问题：

这是带有随机排列字母的特里树的奇怪设计。必须在每个级别上线性搜索想要的字母会破坏首先进行Trie的目的。
当您执行root = root.branches[k];时，您正在创建变量的副本。现在，由于通过指针访问事物，因此在这种情况下它可能对您有用，但实际上只是在自找麻烦。
当您在循环中分配节点时，不会对其进行初始化，这意味着该节点充满了垃圾数据/未知数据并引起问题。
您的实现不必要地复杂，就像您的外部while (1)循环一样。

对于一个非常简单的尝试，我将执行以下操作：

struct tNode {
  bool isWord;
  struct tNode *branches[26];
};

void addWordOccurrence (const char* word, const int wordLength, struct tNode* pRoot) {
   int i;
   int nodeIndex;
   tNode* pCurrentNode = pRoot;

   for (i = 0; i < wordLength; ++i)
   {
       nodeIndex = tolower(word[i]) - 'a';

       if (nodeIndex >= 0 && nodeIndex <= 25)
       {
            if (pCurrentNode->branches[nodeIndex] == NULL)
            {
                pCurrentNode->branches[nodeIndex] = calloc(1, sizeof(tNode));
            }

            pCurrentNode = pCurrentNode->branches[nodeIndex];
       }
   }

   pCurrentNode->isWord = true;
}

您可以使用struct tNode *branches;，但实际上只是添加了您实际上不需要的另一个分配步骤。您使用字符的ASCII值将branches[0]分配给'a'，将branches[25]分配给'z'...无需搜索“ free”点，这实际上会破坏trie的性能。最后，您需要一个类似于isWord的终止符，以便知道“ doctor”是一个单词，而“ docto”不是。