问题描述
出于学习目的,我试图在Redis中编写一个简单的结构化文档存储.在我的示例应用程序中,我正在为数百万个看起来像下面的文档建立索引.
For learning purposes I'm trying to write a simple structured document store in Redis. In my example application I'm indexing millions of documents that look a little like the following.
<book id="1234">
<title>Quick Brown Fox</title>
<year>1999</year>
<isbn>309815</isbn>
<author>Fred</author>
</book>
我正在编写一种查询语言,允许我说YEAR = 1999 AND TITLE="Quick Brown Fox"
(再次,只是为了我的学习,我不在乎我是在重新发明轮子!),这应该返回匹配项的ID.文档(在这种情况下为1234
). AND
和OR
表达式可以任意嵌套.
I'm writing a little query language that allows me to say YEAR = 1999 AND TITLE="Quick Brown Fox"
(again, just for my learning, I don't care that I'm reinventing the wheel!) and this should return the ID's of the matching documents (1234
in this case). The AND
and OR
expressions can be arbitrarily nested.
对于每个文档,我都会如下生成密钥
For each document I'm generating keys as follows
BOOK_TITLE.QUICK_BROWN_FOX = 1234
BOOK_YEAR.1999 = 1234
我正在使用 SADD 将这些文档放入一组格式为KEYNAME.VALUE = { REFS }
的系列中
I'm using SADD to plop these documents in a series of sets in the form KEYNAME.VALUE = { REFS }
.
查询时,我将表达式解析为AST.一个简单的表达式(例如YEAR=1999
)直接映射到 SMEMBERS 命令,该命令可以使我获得匹配的文档集.但是,我不确定如何最有效地执行与"和或"部分.
When I do the querying, I parse the expression into an AST. A simple expression such as YEAR=1999
maps directly to a SMEMBERS command which gets me the set of matching documents back. However, I'm not sure how to most efficiently perform the AND and OR parts.
给出一个查询,例如:
(TITLE=Dental Surgery OR TITLE=DIY Appendectomy)
AND
(YEAR = 1999 AND AUTHOR = FOO)
我目前向Redis发出以下请求以回答这些查询.
I currently make the following requests to Redis to answer these queries.
-- Stage one generates the intermediate results and returns RANDOM_GENERATED_KEY3
SUNIONSTORE RANDOMLY_GENERATED_KEY1 BOOK_TITLE.DENTAL_SURGERY BOOK_TITLE.DIY_APPENDECTOMY
SINTERSTORE RANDOMLY_GENERATED_KEY2 BOOK_YEAR.1999 BOOK_YEAR.1998
SINTERSTORE RANDOMLY_GENERATED_KEY3 RANDOMLY_GENERATED_KEY1 RANDOMLY_GENERATED_KEY2
-- Retrieving the top level results just requires the last key generated
SMEMBERS RANDOMLY_GENERATED_KEY3
当我遇到AND
时,我会基于两个子键使用 SINTERSTORE (对于OR
我使用 SUNIONSTORE ).我随机生成一个密钥来存储结果(并设置一个短的TTL,这样我就不会用Redft来填充残骸了).在这一系列命令的最后,返回值是一个键,我可以使用该键通过 SMEMBERS .我使用存储功能的原因是,我不想将所有匹配的文档引用都传输回服务器,因此我使用临时键将结果存储在Redis实例上,然后仅将匹配的结果带回结束.
My question is simply, is this the best way to make use of Redis as a document store?
推荐答案
我正在使用带有排序集的类似方法来实现全文索引.总体方法不错,尽管您可以进行一些相当简单的改进.
- 您可以使用查询(或其简短形式)作为关键字,而不是使用随机生成的关键字.这样一来,您就可以重用已经计算出的集合,如果您对通常以相似方式组合的两个大集合进行查询,则可以显着提高性能.
- 将标题作为完整的字符串处理将导致大量的单个成员集.如果确实需要,最好在标题中为单个单词建立索引并过滤最终结果以进行完全匹配.
这篇关于与Redis的复合查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!