问题描述
我有两列巨大的桌子:标题和标题。 Id是bigint,我可以自由选择Title列的类型:varchar,char,text,whatever。列标题包含随机文本字符串,如abcdefg,q,allyourbasebelongtous,最多255个字符。我的任务是通过给定子字符串来获取字符串。子串也具有随机长度,可以是字符串的开始,中间或结尾。执行它的最明显的方法是:
$ p $ SELECT * FROM t LIKE'%abc%'
我不在乎INSERT,我只需要做快速选择。我能做些什么来尽可能快地执行搜索?
我使用MS SQL Server 2008 R2,全文搜索将毫无用处,据我所见。如果你想使用比Randy的答案更少的空间,并且数据中有相当多的重复,你可以创建一个N-Ary树数据结构,其中每个边是下一个字符,并将每个字符串和尾随子字符串挂在其上的数据中。
您可以对节点进行一阶深度编号。然后,您可以为每条记录创建一个最多包含255行的表格,记录的Id以及树中与节点字符串或尾随子串匹配的节点ID。然后,当您执行搜索时,会发现代表您正在搜索的字符串(以及所有尾随子字符串)的节点ID并执行范围搜索。
I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.
My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:
SELECT * FROM t LIKE '%abc%'
I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?
I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.
If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.
You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.
这篇关于通过SQL中的子串查找字符串的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!