python - 在MySQL中存储反向索引

我正在创建一个非常大的倒排索引项。你建议什么方法？
弗斯特

termId - > docId
  a        doc2[locations],doc5[locations],doc12[locations]
  b        doc5[locations],doc7[locations],doc4[locations]

第二

termId - > docId
  a        doc2[locations]
  a        doc5[locations]
  a        doc12[locations]
  b        doc5[locations]
  b        doc7[locations]
  b        doc4[locations]

p.s Lucene不是一个选择

最佳答案

正确的表设计取决于您计划如何使用数据。如果您计划按原样使用"doc2[locations],doc5[locations],doc12[locations]"之类的字符串，而不进行任何进一步的后处理，那么您的First设计就可以了。
但是，如果——正如您的问题所暗示的那样——您有时可能希望将doc2[locations]、doc5[locations]等视为单独的实体，那么您肯定应该使用Second设计。
以下是一些用例，说明了为什么Second设计更好：
如果您使用First并要求所有具有termID = a的文档，则
把绳子拿回来
doc2[locations],doc5[locations],doc12[locations]然后
必须分开。
如果使用第二行，则将每个文档作为单独的行。不许分裂！
Second结构更方便。
或者，假设在某个点上doc5[locations]发生了变化，您需要
更新表。如果使用First设计，则必须使用
一些相对复杂的MySQL string function查找并替换包含它的所有行中的子字符串。（请注意，MySQL没有内置regex substitution。）
如果使用Second设计，更新很容易：

UPDATE table SET docId = "newdoc5[locations]" where docId = "doc5[locations]"

关于python - 在MySQL中存储反向索引，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/13100166/