本文介绍了MySQL标记问题:如何选择已标记为X,Y和Z的项目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 我正在处理一个数据库,其中的项目被标记了一定次数。 项(100k行) / p> id 姓名 > 标签(10k行) id 姓名 item2tag 行) item_id tag_id / li> 我正在寻找最快的解决方案: 已经标记为X,Y和Z(其中X,Y和Z对应于(可能)标记名称)? 。 首先从名称中获取tag_ids: SELECT tag.id WHERE name IN(X,Y,Z); 然后我用这些tag_ids分组,并使用不得不过滤结果: SELECT item2tag。*,count(tag_id) FROM item2tag WHERE tag_id = 1或tag_id = 2或tag_id = 3 GROUP BY item_id HAVING count(tag_id)= 3; 然后我可以选择带有这些ID的项目。 SELECT * FROM item WHERE id IN([来自先前查询的结果]) 我在item2tag中有数百万行,索引为(item_id,tag_id)。这是否是最快的解决方案?解决方案您建议的方法可能是执行查询的最常见方法,可能不是最快的。使用连接可以更快: SELECT T1.item_id FROM item2tag T1 JOIN item2tag T2 ON T1 .item_id = T2.item_id JOIN item2tag T3 ON T2.item_id = T3.item_id WHERE T1.tag_id = 1 AND T2.tag_id = 2 AND T3.tag_id = 3 您应确保拥有以下索引: 主键(item_id,tag_id) 在(tag_id)上的索引。 对于表中几乎每个项目都被标记的情况在搜索至少一个标签的情况下,原始查询花费大约5秒,并且JOIN版本花费大约10秒 - 稍慢。 对于其中两个标签出现的情况非常频繁,并且其中一个标签很少发生,原始查询大约需要0.9秒,而JOIN查询只需要0.003秒 - 这可以显着提高性能。 我用来做性能测试的SQL在下面粘贴。您可以自行运行此测试或稍微修改它,并测试其他查询或不同的方案。 警告:不要运行此脚本您的生产数据库,因为它修改 item2tag 表的内容。运行脚本可能需要几分钟,因为它创建了大量数据。 CREATE TABLE填充NOT NULL PRIMARY KEY AUTO_INCREMENT )ENGINE = Memory; DELIMITER $$ CREATE PROCEDURE prc_filler(cnt INT) BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt< = cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE; END $$ CALL prc_filler(1000000); CREATE TABLE item2tag( item_id INT NOT NULL, tag_id INT NOT NULL, count INT NOT NULL ); INSERT INTO item2tag(item_id,tag_id,count) SELECT id%150001,id%10,1 FROM filler; ALTER TABLE item2tag ADD PRIMARY KEY(item_id,tag_id); ALTER TABLE item2tag ADD KEY(tag_id); - 使标签3很少出现。 UPDATE item2tag SET tag_id = 10 WHERE tag_id = 3 AND item_id> 0; SELECT T1.item_id FROM item2tag T1 JOIN item2tag T2 ON T1.item_id = T2.item_id JOIN item2tag T3 ON T2.item_id = T3.item_id WHERE T1.tag_id = 1 AND T2.tag_id = 2 AND T3.tag_id = 3; SELECT item_id FROM item2tag WHERE tag_id = 1或tag_id = 2或tag_id = 3 GROUP BY item_id HAVING count(tag_id)= 3 ; I'm dealing with a database where items are "tagged" a certain number of times.item (100k rows)idnameother stufftag (10k rows)idnameitem2tag (1,000,000 rows)item_idtag_idcountI'm looking for the fastest solution to:Select items that have been tagged as X, Y, and Z (where X, Y, and Z correspond to (possibly) tag names) ?Here's what I have so far... I'd just like to make sure I'm doing it in the best way possible:First get the tag_ids from the names:SELECT tag.id WHERE name IN ("X","Y","Z");Then I group by those tag_ids and use Having to filter the result:SELECT item2tag.*, count(tag_id) FROM item2tag WHERE tag_id=1 or tag_id=2 or tag_id=3GROUP BY item_idHAVING count(tag_id)=3;Then I can just select from item with those ids.SELECT * FROM item WHERE id IN ([results from prior query])I have millions of rows in item2tag, with an index on (item_id, tag_id). Is this going to be the fastest solution? 解决方案 The method you have suggested is probably the most common way to perform the query but might not be the fastest. Using joins can be faster:SELECT T1.item_idFROM item2tag T1JOIN item2tag T2 ON T1.item_id = T2.item_idJOIN item2tag T3 ON T2.item_id = T3.item_idWHERE T1.tag_id = 1 AND T2.tag_id = 2 AND T3.tag_id = 3You should ensure that you have the following indexes:Primary key on (item_id, tag_id)Index on (tag_id).I performance tested this query against the original in a few different scenarios.For the case where nearly every item in the table is tagged with at least one of the tags being searched for, the original query takes about 5 seconds and the JOIN version takes about 10 seconds - slightly slower.For the case where two of the tags occur very frequently and one of the tags occurs only very rarely the original query takes about 0.9 seconds, whereas the JOIN query takes just 0.003 seconds - a considerable performance improvement.The SQL I used to make performance test is pasted below. You can run this test yourself or modify it slightly and test other queries, or different scenarios.Warning: Don't run this script on your production database as it modifies the contents of the item2tag table. Running the script can take a few minutes as it creates a lot of data.CREATE TABLE filler ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT) ENGINE=Memory;DELIMITER $$CREATE PROCEDURE prc_filler(cnt INT)BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt <= cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE;END$$CALL prc_filler(1000000);CREATE TABLE item2tag ( item_id INT NOT NULL, tag_id INT NOT NULL, count INT NOT NULL);INSERT INTO item2tag (item_id, tag_id, count)SELECT id % 150001, id % 10, 1FROM filler;ALTER TABLE item2tag ADD PRIMARY KEY (item_id, tag_id);ALTER TABLE item2tag ADD KEY (tag_id);-- Make tag 3 occur rarely. UPDATE item2tag SET tag_id = 10 WHERE tag_id = 3 AND item_id > 0;SELECT T1.item_idFROM item2tag T1JOIN item2tag T2 ON T1.item_id = T2.item_idJOIN item2tag T3 ON T2.item_id = T3.item_idWHERE T1.tag_id = 1 AND T2.tag_id = 2 AND T3.tag_id = 3;SELECT item_idFROM item2tagWHERE tag_id=1 or tag_id=2 or tag_id=3GROUP BY item_idHAVING count(tag_id)=3; 这篇关于MySQL标记问题:如何选择已标记为X,Y和Z的项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-18 22:24