MySQL：低基数/选择性列=如何索引？

本文介绍了MySQL：低基数/选择性列=如何索引？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要将索引添加到我的表格（列），并在此帖子中发现：

I need to add indexes to my table (columns) and stumbled across this post:

报价：
说了，你可以清楚地添加很多无意义的索引到一个表，不会做任何事情。将B树索引添加到具有2个不同值的列中将是无意义的，因为它不会在查找数据方面添加任何内容。列中的值越独特，就越能从索引中获益。

Quote:"Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index."

如果只有两个不同的值，索引是否真的毫无意义？给定一个表如下（MySQL数据库，InnoDB）

Is an Index really pointless if there are only two distinct values? Given a table as follows (MySQL Database, InnoDB)

Id (BIGINT)
fullname (VARCHAR)
address (VARCHAR)
status (VARCHAR)

其他条件：

数据库包含300百万条记录

状态只能为已启用和已停用

150百万记录已启用状态= 150百万记录
stauts =已停用

如果没有状态索引，则其中status ='enabled'的选择将导致具有300百万记录的完整tablecan处理？

My understanding is, without having an index on status, a select with where status=’enabled’ would result in a full tablescan with 300 Million Records to process?

当我对状态使用BTREE索引时，查找的效率如何？

How efficient is the lookup when I use a BTREE index on status?

我应该为此列建立索引吗？

Should I index this column or not?

MySQL InnoDB提供了什么替代方法（也许任何其他索引）来有效地查找由给定示例中的where status =enabled基数/选择性的值？

What alternatives (maybe any other indexes) does MySQL InnoDB provide to efficiently look records up by the "where status="enabled" clause in the given example with a very low cardinality/selectivity of the values?

推荐答案

你描述的索引几乎没有意义。

The index that you describe is pretty much pointless. An index is best used when you need to select a small number of rows in comparison to the total rows.

这样做的原因与数据库如何访问表格有关。可以通过全表扫描来评估表，其中每个块被依次读取和处理，或者通过rowid或键查找，其中数据库具有键/ rowid并读取其所需的确切行。

The reason for this is related to how a database accesses a table. Tables can be assessed either by a full table scan, where each block is read and processed in turn. Or by a rowid or key lookup, where the database has a key/rowid and reads the exact row it requires.

在使用基于主键或另一个唯一索引的where子句（例如 where id = 1 ）的情况下，数据库可以使用索引来精确地引用行的数据存储位置，这显然比执行全表扫描和处理每个块更有效。

In the case where you use a where clause based on the primary key or another unique index, eg. where id = 1, the database can use the index to get an exact reference to where the row's data is stored. This is clearly more efficient than doing a full table scan and processing every block.

现在回到你的例子，你有一个where子句 where status ='enabled'，索引将返回150m行，数据库将不得不依次使用单独小读。而使用全表扫描访问表允许数据库使用更高效的更大的读取。

Now back to your example, you have a where clause of where status = 'enabled', the index will return 150m rows and the database will have to read each row in turn using separate small reads. Whereas accessing the table with a full table scan allows the database to make use of more efficient larger reads.

有一个点，最好只是做一个完整的表扫描而不是使用索引。使用mysql，您可以使用 FORCE INDEX（idx_name）作为查询的一部分，以允许在每个表访问方法之间进行比较。

There is a point at which it is better to just do a full table scan rather than use the index. With mysql you can use FORCE INDEX (idx_name) as part of your query to allow comparisons between each table access method.

参考文献：

这篇关于MySQL：低基数/选择性列=如何索引？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！