本文介绍了MySQL索引基数 - 性能与存储效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设你有一个拥有1亿行的MySQL 5.0 MyISAM表,在两个整数列上有一个索引(主键除外)。

Say you have a MySQL 5.0 MyISAM table with 100 million rows, with one index (other than primary key) on two integer columns.

从我的认识不足对于B树结构,我认为 lower 基数意味着索引的存储效率更好,因为父节点较少。而更高基数意味着存储效率更低,但读取性能更快,因为它必须通过较少的分支来获取它所寻找的任何数据以缩小行数对于查询。

From my admittedly poor understanding of B-tree structure, I believe that a lower cardinality means the storage efficiency of the index is better, because there are less parent nodes. Whereas a higher cardinality means less efficient storage, but faster read performance, because it has to navigate through less branches to get to whatever data it is looking for to narrow down the rows for the query.

(注意 - 通过低对比高,我并不是说100万行表中的100万对9900万。我意味着更像是9000万对比9500万)

(Note - by "low" vs "high", I don't mean e.g. 1 million vs 99 million for a 100 million row table. I mean more like 90 million vs 95 million)

我的理解是否正确?

相关问题 - 基数如何影响表现?

Related question - How does cardinality affect write performance?

推荐答案

更高的基数意味着更好的读取性能,因为根据定义,读取的记录更少。

Higher cardinality means better read performance because, by definition, there are fewer records to read.

要处理像这样的查询:

SELECT  *
FROM    mytable
WHERE   indexed_col = @myvalue

,引擎应该执行以下步骤:

, the engine should do the following steps:


  1. 找到满足条件的第一个条目。

  1. Find the first entry satisfying the condition.

这遍历了 B-Tree ,从根条目开始。

This is done traversing the B-Tree, starting from the root entry.

在整个页面中,搜索是通过以下 B-Tree 链接;在页面内,搜索是使用二进制搜索执行的(除非你的密钥被压缩,在这种情况下它是线性搜索)。

Across the pages, the search is performed by following B-Tree links; within a page, the search is performed using binary search (unless your keys are compressed, in which case it's a linear search).

这个算法对于两个高基数的效率相同和低基数列。在这些列表中找到第一个 3 (而不是任何 3 ):

This algorithm same efficiency for both high cardinality and low cardinality columns. Finding the first 3 (as opposed to any 3) in these lists:

1  2  3  4  5  6  7  8  9  10

3  3  3  3  3  3  3  3  4  4

需要相同的 O(log(n))步骤。

遍历索引,直到键值发生变化。当然,这需要线性时间:您拥有的记录越多,您需要遍历的越多。

Traversing the index until the key value changes. This, of course, requires linear time: the more records you have, the more you need to traverse.

如果你只需要第一条记录:

If you only need the first record:

SELECT  *
FROM    mytable
WHERE   indexed_col = @myvalue
LIMIT 1

,列基数不会影响读取性能。

, the column cardinality does not affect read performance.

每个索引键都有一个隐藏的附加值:记录指针。这是拥有索引的重点:您需要知道它指向哪条记录。

Each index key has a hidden additional value: a record pointer. This is the whole point of having an index: you need to know which record does it point to.

由于记录指针(根据定义)是唯一的,因此每个索引键也很独特。共享相同键值的索引条目按记录指针排序。

Since a record pointer, by definition, is unique, each index key is unique too. The index entries sharing the same key value are sorted by the record pointer.

这是为了使索引可维护:如果删除具有索引列值的记录由数百万条其他记录共享,相应的索引记录也应该删除。但是没有查看整百万个索引记录:相反,记录指针被用作额外的搜索条件。

This is to make the index maintainable: if you delete a record with a value of an indexed column shared by a million of other records, the corresponding index record should be deleted too. But the whole million of the index records is not being looked through: instead, the record pointer is used as an additional search condition.

每个索引键实际上都是唯一的(即使你没有把索引定义为唯一的,因此,也有最大的基数。

Each index key is in fact unique (even if you don't define the index as unique), and, hence, has maximum cardinality possible.

所以问题的答案是:不,列基数不影响索引写入性能。

So the answer to your questions is: no, the column cardinality does not affect the index write performance.

这篇关于MySQL索引基数 - 性能与存储效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-10 22:39