如何处理已删除的文件

如何处理已删除的文件

本文介绍了Solr DIH -- 如何处理已删除的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对我的 web 应用程序进行 Solr 驱动的搜索,我认为最好使用 DataImportHandler 通过数据库处理与应用程序的同步.我喜欢只检查 last_updated_date 字段的优雅.好东西.但是,我不知道如何使用这种方法处理删除文档.在我看来,我有 2 个选择.当文档被删除时,我可以从客户端向 Solr 发送显式消息,或者我可以添加一个已删除"标志并将对象保留在数据库中,以便 Solr 会注意到文档已更改并且现在已被删除"."我可以添加一个查询过滤器,它会忽略带有已删除标志的结果,但是将所有已删除的文档包含在 Lucene 索引中似乎效率低下.其他人在做什么?

I'm playing around with a Solr-powered search for my webapp, and I figured it'd be best to use the DataImportHandler to handle syncing with the app via the database. I like the elegance of just checking the last_updated_date field. Good stuff. However, I don't know how to handle deleting documents with this approach. The way I see it, I've got 2 choices. I could either send an explicit message to Solr from the client when a document is deleted, or I could add a "deleted" flag and leave the object in the database, so that Solr will notice that the document has changed and is now "deleted." I could add a query filter that would disregard results with the deleted flag, but it seems inefficient to include all the deleted documents in the Lucene index. What do other folks do?

推荐答案

这些是您的选择:

  • 使用 DIH 特殊命令 $deleteDocById 或 $deleteDocByQuery(需要 Solr 1.4+)
  • 使用DIH的clean参数在导入前删除整个索引.莉>
  • 使用 preImportDeleteQuery 定义导入前要清理的内容.(需要 Solr 1.4+)
  • 使用数据库触发器而不是 DIH 来管理更新索引.
  • 如果您使用某种 ORM,请使用其拦截功能而不是 DIH.例如,您可以使用 休眠事件 在更新、插入或删除时更新索引.
  • Use DIH special commands $deleteDocById or $deleteDocByQuery (requires Solr 1.4+)
  • Use the clean parameter of DIH to delete the whole index before importing.
  • Use preImportDeleteQuery to define what's going to be cleaned up before importing. (requires Solr 1.4+)
  • Use database triggers instead of DIH to manage updating the index.
  • If you're using some sort of ORM use its interception capabilities instead of DIH. For example you can use hibernate events to update the index on update, insert or delete.

这篇关于Solr DIH -- 如何处理已删除的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 10:20