问题描述
说我有以下基本MySQL数据:
Say I have the following basic MySQL data:
CREATE TABLE my_words (my_word VARCHAR(255));
INSERT INTO my_words VALUES ('dog');
INSERT INTO my_words VALUES ('cat');
INSERT INTO my_words VALUES ('tree');
INSERT INTO my_words VALUES ('ball');
INSERT INTO my_words VALUES ('life');
INSERT INTO my_words VALUES ('complex');
INSERT INTO my_words VALUES ('digeridoo');
INSERT INTO my_words VALUES ('hamster');
INSERT INTO my_words VALUES ('it');
INSERT INTO my_words VALUES ('house');
INSERT INTO my_words VALUES ('love');
INSERT INTO my_words VALUES ('zealous');
INSERT INTO my_words VALUES ('nevis');
INSERT INTO my_words VALUES ('mountain');
INSERT INTO my_words VALUES ('call');
INSERT INTO my_words VALUES ('nail');
INSERT INTO my_words VALUES ('rat');
INSERT INTO my_words VALUES ('hat');
SELECT CONCAT(w1.my_word, w2.my_word) joined
FROM my_words w1, my_words w2
WHERE LENGTH(CONCAT(w1.my_word, w2.my_word)) = 8
ORDER BY RAND() LIMIT 5;
我可以在末尾编写SQL语句,以生成由2个单词组成的5个随机串联字符串的列表,其中字符串的总长度为8个字符.
I can write the SQL statement at the end to generate a list of 5 random concatenated strings made up of 2 words, where the total length of the string is 8 characters.
对于像我在样本中一样的简单数据表,这很好用.
That works fine for a simple table of data like I've got in the sample.
但是,我正在使用的真实"表包含大约6,200行.
However, the "real" table I am working with contains about 6,200 lines.
如果我尝试相同类型的语句,则需要10秒才能生成5个字符串.
If I try the same type of statement it takes 10 seconds to generate the 5 strings.
我想SQL效率很低,因为它每次都两次查询该表,并且这些表没有以任何方式联接.
I guess the SQL is very inefficient as it is searching through the table twice each time and those tables aren't joined in any way.
我想知道是否存在一种更简单的方法来从表中提取由2个单词组成的单词字符串,其中连接字符串的长度为8个字符长(尽管可以更改-我只是将8用作例如).
I wondered if there is a simpler way to extract strings of words made up of 2 words from the table, where the length of the concatenated string is 8 characters long (though that can change - I'm just using 8 as an example).
谢谢
更新1
说明计划:
EXPLAIN
SELECT CONCAT(w1.fld_un, w2.fld_un) joined
FROM j_un w1
JOIN j_un w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE w2 range un_len un_len 5 \N 2694 Using where; Using temporary; Using filesort
1 SIMPLE w1 ref un_len un_len 5 func 527 Using where
更新2
我不确定是否相关,但是"fld_un"表大约有6,200行.
I'm not sure if it's relevant, but the "fld_un" table has about 6,200 rows.
单词"保存在"fld_un"列中.
The "word" is held in the "fld_un" column.
表的结构为:
Field Type Null Key Default Extra
fld_id int(11) NO PRI NULL auto_increment
fld_un varchar(255) YES NULL
fld_cat_id int(11) YES MUL NULL
fld_len int(2) NO MUL NULL
这些索引存在于表中:
Keyname Type Cardinality Field
PRIMARY PRIMARY 6318 fld_id
cat INDEX 15 fld_cat_id
bob INDEX 11 fld_len
表上已经有一个主索引是否重要?从技术上讲,我不需要.
Does it matter that there is already a primary index on the table? I don't technically need that I suppose.
声明:
SELECT CONCAT(word1, word2) joined
FROM (
SELECT w1.fld_un word1, w2.fld_un word2
FROM j_un2 w1
JOIN j_un2 w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5) x;
查询耗时23.6805秒
Query took 23.6805 sec
说明计划:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5
2 DERIVED w2 range bob bob 4 NULL 4627 Using where; Using temporary; Using filesort
2 DERIVED w1 ref bob bob 4 func 527 Using where
当我按照Thorsten Kettner的建议修改"bob"索引以包括2列时:
When I revise the "bob" index to include 2 columns as suggested by Thorsten Kettner:
Keyname Type Cardinality Field
bob INDEX 11 fld_len, fld_un
然后重新测试:
SELECT CONCAT(word1, word2) joined
FROM (
SELECT w1.fld_un word1, w2.fld_un word2
FROM j_un2 w1
JOIN j_un2 w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5) x;
查询花了30.3394秒返回5行.
The query took 30.3394 sec to return 5 rows.
说明计划:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5
2 DERIVED w2 range bob bob 4 NULL 4211 Using where; Using temporary; Using filesort
2 DERIVED w1 ref bob bob 4 func 527 Using where
更新3
没有按rand()命令"运行,运行了0.0011秒!
Ran without "order by rand()" and it ran in 0.0011 sec!
推荐答案
您可以添加一列,例如包含单词长度的word_length
,并在word_length
列上添加索引.通常,包含可以从另一列派生的数据的设计会很糟糕,但是在这种情况下,为了性能起见,您需要破坏纯度.然后,您的查询可以在此列中使用JOIN
条件:
You can add a column, e.g. word_length
that contains the length of the word, and add an index on the word_length
column. Normally it would be poor design to include data that can be derived from another column, but in this case you need to break purity for performance sake. Then your query can use a JOIN
condition using this column:
SELECT CONCAT(w1.my_word, w2.my_word) joined
FROM my_words w1
JOIN my_words w2 ON w1.word_length = 8 - w2.word_length
WHERE w2.word_length < 8
ORDER BY RAND()
LIMIT 5
您可以使用INSERT
和UPDATE
触发器自动填充word_length
列.
You can use INSERT
and UPDATE
triggers to fill in the word_length
column automatically.
在过滤到5行之后进行连接可能也会有所帮助:
It might also be helpful to do the concatenation after filtering down to the 5 rows:
SELECT CONCAT(word1, word2) joined
FROM (
SELECT w1.my_word word1, w2.my_word word2
FROM my_words w1
JOIN my_words w2 ON w1.word_length = 8 - w2.word_length
WHERE w2.word_length < 8
ORDER BY RAND()
LIMIT 5) x
这篇关于从表中选择随机单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!