问题描述
我有TABLE A约45,000条记录
我有TABLE B大约150万条记录
我有一个查询:
<$ p $更新
schema1.tablea a
内部连接(
SELECT DISTINCT
ID,Lookup,
IDpart1,IDpart2
FROM
schema1.tableb
WHERE
IDpart1不为空
AND
查找不为空
ORDER BY
ID,Lookup
)b使用(ID,Lookup)
set
a.Elg_IDpart1 = b.IDpart1,
a.Elg_IDpart2 = b.IDpart2
其中
a.ID不是NULL
AND
a.Elg_IDpart1为NULL
所以我强制索引在ID,查找。每个表在这些列上都有一个索引,但由于子查询我强制它。
它运行的是FOR-EVER,它确实应该我会想象在5分钟之内......
我的问题是关于索引,而不是查询。
我知道你不能在有序索引中使用散列索引。
我目前都有索引ID,查询,并作为一个索引,它是一个B树索引。基于我的
+ --- UPDATE - + ------------- + ------- + ------ + ------------------- ---------- + -------------- + --------- + -------------- ----- + ------- + ------------- +
| id | select_type |表| |键入| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + ------- + ------ + ----------- ------------------ + -------------- + --------- + ------ ------------- + ------- + ------------- +
| 1 | SIMPLE | m | ALL | Lookup_Idx,ID_Idx,ID_Lookup | | | | 44023 |使用where |
| 1 | SIMPLE | c | ref | ID_LookupIdx | ID_LookupIdx | 5 | schema1.tableb.ID | 4 |使用where |
+ ---- + ------------- + ------- + ------ + ----------- ------------------ + -------------- + --------- + ------ ------------- + ------- + ------------- +
tablea 相关索引:
tableb 相关索引:
- ID(ID)
- Lookup_Idx(Lookup)
- ID_Lookup_Idx(ID,Lookup)$ c
所有的索引都是正常的B-树。
解决方案首先,处理您提出的具体问题:
-
如:
因此,在考虑 HASH indexing,应该知道 在 MEMORY 和 NDB 存储引擎:因此可能不是您的选择。
此外,请注意, ID 和 Lookup 可能不是最优的,因为 WHERE 谓词也会过滤 tablea.Elg_IDpart1 和 tableb.IDpart1 —您也可以从这些列的索引中受益。
-
-
假设所需的索引类型受存储引擎支持,您可以根据需要混合使用。
$ b -
通常足够聪明,但并非总是如此。然而,在这种情况下,它可能已经确定索引的基数是这样的,以至于最好使用它所选择的那些索引。
而不是使用ID,Lookup,我想强制它使用两者,或至少把它变成一种不同类型的索引,看看是否有帮助?
您可以使用强制MySQL使用不同的索引到那些优化器本来会选择的索引。
现在,根据您使用的MySQL版本,从子查询派生的表可能没有任何可用于进一步处理的索引:因此与 b 的连接可能需要对该派生表进行完整扫描(您的问题中没有足够的信息来确定这可能是多少问题,但 schema1.tableb 拥有150万条记录表明它可能是一个重要因素)。 获取更多信息。
因此,您应该尽量避免使用如果可能的话。在这种情况下,派生表似乎没有任何用处,因为可以简单地加入 schema1.tablea 和 schema1.tableb 直接:
pre $ UPDATE schema1.tablea a
JOIN schema1.tableb b USING(ID,Lookup)
SET a.Elg_IDpart1 = b.IDpart1,
a.Elg_IDpart2 = b.IDpart2
WHERE a.Elg_IDpart1 IS NULL
AND a.ID不是NULL
AND b.IDpart1 IS NOT NULL
AND b.Lookup IS NOT NULL
ORDER BY ID,Lookup
唯一遗失的是 DISTINCT 记录的过滤器,但重复的记录会简单地(试图)再次用这些相同的值覆盖更新后的值 - mdash;这将没有任何效果,但可能已经证明是非常昂贵的(特别是在该表中有这么多记录)。
使用 ORDER BY 在派生表中是毫无意义的,因为它不能被依赖来实现任何特定的命令到 UPDATE ,而在这个修订版本中,它将确保覆盖以前命令的所有更新都按照指定的顺序进行:但这是必要的吗?也许它可以被删除并保存在任何排序操作中。
应该检查 WHERE 子句中的谓词:他们是否都有必要( NOT NULL 检查 a.ID 和 b.Lookup 是多余的,因为任何这样的 NULL 记录都将被 JOIN 谓词)?
总的来说,这给我们留下了:
UPDATE schema1.tablea a
JOIN schema1.tableb b USING(ID,Lookup)
SET a.Elg_IDpart1 = b.IDpart1,
a.Elg_IDpart2 = b.IDpart2
WHERE a。 Elg_IDpart1 IS NULL
AND b.IDpart1 IS NOT NULL
仅当性能仍不理想时进一步看索引。相关列(即在 JOIN 和 WHERE 谓词中使用的列)是否已编制索引?索引是否被MySQL选择使用(记住,它只能对每个表使用一个索引进行查找:用于测试 JOIN 谓词和过滤器谓词:也许你需要一个适当的复合索引)?使用 EXPLAIN 来检查查询执行计划,以进一步调查这些问题。
So maybe this is noob, but I'm messing with a couple tables.
I have TABLE A roughly 45,000 records
I have TABLE B roughly 1.5 million records
I have a query:
update schema1.tablea a inner join ( SELECT DISTINCT ID, Lookup, IDpart1, IDpart2 FROM schema1.tableb WHERE IDpart1 is not NULL AND Lookup is not NULL ORDER BY ID,Lookup ) b Using(ID,Lookup) set a.Elg_IDpart1 = b.IDpart1, a.Elg_IDpart2 = b.IDpart2 where a.ID is NOT NULL AND a.Elg_IDpart1 is NULL
So I am forcing the index on ID, Lookup. Each table does have a index on those columns as well but because of the sub-query I forced it.
It is taking FOR-EVER to run, and it really should take, i'd imagine under 5 minutes...
My questions are in regards to the indexes, not the query.
I know that you can't use hash index in ordered index.
I currently have indexes on both ID, Lookup sperately, and as one index, and it is a B-Tree index. Based on my WHERE Clause, does a hash index fit for as an optimization technique??
Can I have a single hash index, and the rest of the indexes b B-tree index?
This is not a primary key field.
I would post my explain but i changed the name on these tables. Basically it is using the index only for ID...instead of using the ID, Lookup, I would like to force it to use both, or at least turn it into a different kind of index and see if that helps?
Now I know MySQL is smart enough to determine which index is most appropriate, so is that what it's doing?The Lookup field maps the first and second part of the ID...
Any help or insight on this is appreciated.
UPDATE
An EXPLAIN on the UPDATE after I took out sub-query.
+----+-------------+-------+------+-----------------------------+--------------+---------+-------------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+-----------------------------+--------------+---------+-------------------+-------+-------------+ | 1 | SIMPLE | m | ALL | Lookup_Idx,ID_Idx,ID_Lookup | | | | 44023 | Using where | | 1 | SIMPLE | c | ref | ID_LookupIdx | ID_LookupIdx | 5 | schema1.tableb.ID | 4 | Using where | +----+-------------+-------+------+-----------------------------+--------------+---------+-------------------+-------+-------------+
tablea relevant indexes:
- ID_LookupIdx (ID, Lookup)
tableb relevant indexes:
- ID (ID)
- Lookup_Idx (Lookup)
- ID_Lookup_Idx (ID, Lookup)
All of the indexes are normal B-trees.
Firstly, to deal with the specific questions that you raise:
As documented under CREATE INDEX Syntax:
Therefore, before even considering HASH indexing, one should be aware that it is only available in the MEMORY and NDB storage engines: so may not even be an option to you.
Furthermore, be aware that indexes on combinations of ID and Lookup alone may not be optimal, as your WHERE predicate also filters on tablea.Elg_IDpart1 and tableb.IDpart1—you may benefit from indexing on those columns too.
Provided that the desired index types are supported by the storage engine, you can mix them as you see fit.
You could use an index hint to force MySQL to use different indexes to those that the optimiser would otherwise have selected.
It is usually smart enough, but not always. In this case, however, it has probably determined that the cardinality of the indexes is such that it is better to use those that it has chosen.
Now, depending on the version of MySQL that you are using, tables derived from subqueries may not have any indexes upon them that can be used for further processing: consequently the join with b may require a full scan of that derived table (there's insufficient information in your question to determine exactly how much of a problem this might be, but schema1.tableb having 1.5 million records suggests it could be a significant factor).
See Subquery Optimization for more information.
One should therefore try to avoid using derived tables if at all possible. In this case, there does not appear to be any purpose to your derived table as one could simply join schema1.tablea and schema1.tableb directly:
UPDATE schema1.tablea a JOIN schema1.tableb b USING (ID, Lookup) SET a.Elg_IDpart1 = b.IDpart1, a.Elg_IDpart2 = b.IDpart2 WHERE a.Elg_IDpart1 IS NULL AND a.ID IS NOT NULL AND b.IDpart1 IS NOT NULL AND b.Lookup IS NOT NULL ORDER BY ID, Lookup
The only thing that has been lost is the filter for DISTINCT records, but duplicate records will simply (attempt to) overwrite updated values with those same values again—which will have no effect, but may have proved very costly (especially with so many records in that table).
The use of ORDER BY in the derived table was pointless as it could not be relied upon to achieve any particular order to the UPDATE, whereas in this revised version it will ensure that any updates that overwrite previous ones take place in the specified order: but is that necessary? Perhaps it can be removed and save on any sorting operation.
One should check the predicates in the WHERE clause: are they all necessary (the NOT NULL checks on a.ID and b.Lookup, for example, are superfluous given that any such NULL records will be eliminated by the JOIN predicate)?
Altogether, this leaves us with:
UPDATE schema1.tablea a JOIN schema1.tableb b USING (ID, Lookup) SET a.Elg_IDpart1 = b.IDpart1, a.Elg_IDpart2 = b.IDpart2 WHERE a.Elg_IDpart1 IS NULL AND b.IDpart1 IS NOT NULL
Only if performance is still unsatisfactory should one look further at the indexing. Are relevant columns (i.e. those used in the JOIN and WHERE predicates) indexed? Are the indexes being selected for use by MySQL (bear in mind that it can only use one index per table for lookups: for testing both the JOIN predicate and the filter predicates: perhaps you need an appropriate composite index)? Check the query execution plan by using EXPLAIN to investigate such issues further.
这篇关于用于优化的MySQL哈希索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!