问题描述
我是 Lucene.Net 的新手,目前正在进行研发以将其用于 .Net 应用程序.由于 Lucene.Net 是一个通用库,它与 SQL Server、SQLite 等数据源无关.它只知道你有一个你想要索引的 Lucene 文档.因此,当我们将数据从任何数据源转储到 Lucene.Net 时.当数据在 SQL 数据库中时,我们如何使 Lucene.Net 文档保持最新(例如).保持两种数据(即(Lucene.Net 和 SQL)同步)的一种方法是在每次数据库更新期间不断更新 Lucene 索引.我们也知道有人可以手动更改 SQL 数据库,在这种情况下我们如何更新 Lucene 索引?
I am new on Lucene.Net and currently doing R&D to use this for .Net applications. As Lucene.Net is a general purpose library and it has nothing to do with data sources like SQL Server, SQLite, etc. It only knows you have a Lucene document that you want indexed. So when we dump data to Lucene.Net from any data source. How can we make Lucene.Net documents up to date as the data is in SQL Database(For example). One way to keep both data, i.e. (Lucene.Net and SQL) sync is to continually update the Lucene index during each database update. We also know that there is a possibility that someone can made manually changes to SQL database, in that scenario how we can update Lucene indexes?
推荐答案
我可以提供有关如何执行此操作的概念性概述.基本上你需要三样东西.
I can provide a conceptual overview of how to do this. Fundamentally you need three things.
- 一种知道每次sql数据库中相关数据发生变化的方法
- 捕获有关该更改的信息的地方,称为更改日志.
- 读取更改日志、将这些更改应用于 LuceneNet 索引并标记更改日志中的记录已处理的例程.
当然有很多不同的方法来处理这些.
There are of course lots of different ways to handle each of these.
处理 #1 的最简单方法是您的数据库是否支持插入、更新和删除触发器.如果是这样,那么您可以在向 LuceneNet 索引提供数据的每个表上添加这三个触发器,并且当其中一个表中的记录发生更改时,触发器可以自动将记录写入更改日志,指示表、记录 ID 和操作(插入、更新、删除).如果您的数据库不支持触发器,那就有点困难了.您可以挂钩到您的应用在执行插入、更新和删除操作时用来与数据库对话的一些常用 API,并让该挂钩在更改日志中记录相同类型的信息.
The easiest way to handle #1 is if your database supports insert, update and delete triggers. If it does then you can add these three triggers on every table that supplies data to the LuceneNet index and when a record in one of those tables changes the trigger can automatically write a record into the change log that indicate the table, record id and the operation (insert, update, delete). If your database does not support triggers then it's a bit harder. You could hook into some common api that your app uses to talk to the database when doing an insert, update, and delete and have that hook record the same sort of info in a change log.
更改日志可以有多种形式,但最简单的方法可能是在 sql 数据库中创建一个表.这样,插入、更新和删除触发器可以通过将记录插入到 changeLog 表中来直接记录他们的观察结果.如果您从 api 包装器写入数据,也可以将其显示为 sql 数据库表.
The change log can take many forms, but the easiest way is probably to just create a table in the sql database. This way the insert, update and delete triggers can record their observations directly by inserting a record into the changeLog table. Having it manifest as a sql database table also works if you are writing to it from an api wrapper.
有很多方法可以实现这一点,但最可靠的方法可能是使用计时器启动一个后台线程,该线程每隔这么多秒检查一次是否存在未处理的 changeLog 记录.如果它找到这样的记录,它会将它们读入,检查它是否用于插入、更新或删除操作以及用于哪个表和记录 ID.如果插入或更新,它会从 sql 数据库中读取记录,并在 LuceneNet 中插入或更新记录.如果删除它直接删除LuceneNet中的记录.然后它在 changeLog 记录上设置一个布尔值,以指示该记录已被处理.
There are alot of ways to implement this, but probably the most robust is to use a timer to kick off a background thread that checks for the presence of unprocessed changeLog records every so many seconds. If it finds such records, it reads them in, checks whether it's for an insert, update or delete operation and for which table and record ID. If insert or update, it reads the records from the sql database and inserts or updates the rec in LuceneNet. If for a deleted it directly deletes the record in LuceneNet. Then it sets a boolean on the changeLog record to indicate that the record has been processed.
可以添加更多的花里胡哨,但这应该让您非常清楚地了解如何实现一种方法来使 LuceneNet 索引近乎实时地保持最新状态.
There are more bells and whistles that can be added, but that should give you a pretty clear picture of how to implement a way to keep the LuceneNet index up to date in near real time.
这篇关于在 SQL 数据库中进行手动更改时,Lucene.Net 索引更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!