我想听听有关实现以下问题的数据库解决方案的一些建议
1) There are 100 million XML documents saved to the database per
day.
2) The database hold maximum 3 days of data
3) 1 million query request per day
4) The value through which the documents are filtered are stored in
a seperate table and mapped with the corresponding XMl document ID.
5) The documents are requested based on date range, documents
matching a list of ID's, Top 10 new documents, records that are new
after the previous request
这是我到目前为止所做的
1) Checked if I can use Redis, it is limited to few datatypes and
also cannot use multiple where conditions to filter the Hash in
Redis. Indexing based on date and lots of there fields. I am unable
to choose a right datastructure to store it on a hash
2) Investigated DynamoDB, its again a key vaue store where all the
filter conditions should be stored as one value. I am not sure if it
will be efficient querying a json document to filter the right XML
documnent.
3) Investigated Cassandra and it looks like it may fit my
requirement but it has a limitation saying that the read operations
might be slow. Cassandra has an advantage of faster write operation
over changing data. This looks like the best possible solition used
so far.
当前,我们正在使用SQL Server,并且存在性能问题,因此正在寻找更好的解决方案。
请提出建议,谢谢。
最佳答案
并不是说Cassandra中的读取可能很慢,但是很难保证SLA可以进行读取(通常它们会很快,但是其中一些会很慢)。
Cassandra没有将来可能需要的搜索功能(排序,按多个字段搜索,排名搜索)。您可以使用Cassandra来实现这一点,但是显然要比使用适合于搜索操作的数据库付出更多的努力。
我建议您查看Lucene / Elasticsearch。让我从其主要网站引用Lucene的功能:
可扩展
强大,准确且高效的搜索算法
关于mongodb - 用于过滤XML文档的数据库,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/37799442/