本文介绍了MySQL:低基数/选择性列=如何索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将索引添加到我的表格(列),并在此帖子中发现:

I need to add indexes to my table (columns) and stumbled across this post:

报价:
说了,你可以清楚地添加很多无意义的索引到一个表,不会做任何事情。将B树索引添加到具有2个不同值的列中将是无意义的,因为它不会在查找数据方面添加任何内容。列中的值越独特,就越能从索引中获益。

Quote:"Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index."

如果只有两个不同的值,索引是否真的毫无意义?给定一个表如下(MySQL数据库,InnoDB)

Is an Index really pointless if there are only two distinct values? Given a table as follows (MySQL Database, InnoDB)

Id (BIGINT)
fullname (VARCHAR)
address (VARCHAR)
status (VARCHAR)

其他条件:


  • 数据库包含300百万条记录

  • 状态只能为已启用和已停用

  • 150百万记录已启用状态= 150百万记录
    stauts =已停用

如果没有状态索引,则其中status ='enabled'的选择将导致具有300百万记录的完整tablecan处理?

My understanding is, without having an index on status, a select with where status=’enabled’ would result in a full tablescan with 300 Million Records to process?

当我对状态使用BTREE索引时,查找的效率如何?

How efficient is the lookup when I use a BTREE index on status?

我应该为此列建立索引吗?

Should I index this column or not?

MySQL InnoDB提供了什么替代方法(也许任何其他索引)来有效地查找由给定示例中的where status =enabled基数/选择性的值?

What alternatives (maybe any other indexes) does MySQL InnoDB provide to efficiently look records up by the "where status="enabled" clause in the given example with a very low cardinality/selectivity of the values?

推荐答案

你描述的索引几乎没有意义。

The index that you describe is pretty much pointless. An index is best used when you need to select a small number of rows in comparison to the total rows.

这样做的原因与数据库如何访问表格有关。可以通过全表扫描来评估表,其中每个块被依次读取和处理,或者通过rowid或键查找,其中数据库具有键/ rowid并读取其所需的确切行。

The reason for this is related to how a database accesses a table. Tables can be assessed either by a full table scan, where each block is read and processed in turn. Or by a rowid or key lookup, where the database has a key/rowid and reads the exact row it requires.

在使用基于主键或另一个唯一索引的where子句(例如 where id = 1 )的情况下,数据库可以使用索引来精确地引用行的数据存储位置,这显然比执行全表扫描和处理每个块更有效。

In the case where you use a where clause based on the primary key or another unique index, eg. where id = 1, the database can use the index to get an exact reference to where the row's data is stored. This is clearly more efficient than doing a full table scan and processing every block.

现在回到你的例子,你有一个where子句 where status ='enabled',索引将返回150m行,数据库将不得不依次使用单独小读。而使用全表扫描访问表允许数据库使用更高效的更大的读取。

Now back to your example, you have a where clause of where status = 'enabled', the index will return 150m rows and the database will have to read each row in turn using separate small reads. Whereas accessing the table with a full table scan allows the database to make use of more efficient larger reads.

有一个点,最好只是做一个完整的表扫描而不是使用索引。使用mysql,您可以使用 FORCE INDEX(idx_name)作为查询的一部分,以允许在每个表访问方法之间进行比较。

There is a point at which it is better to just do a full table scan rather than use the index. With mysql you can use FORCE INDEX (idx_name) as part of your query to allow comparisons between each table access method.

参考文献:

这篇关于MySQL:低基数/选择性列=如何索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 02:33