问题描述
我需要将索引添加到我的表格(列),并在此帖子中发现:
I need to add indexes to my table (columns) and stumbled across this post:
报价:
说了,你可以清楚地添加很多无意义的索引到一个表,不会做任何事情。将B树索引添加到具有2个不同值的列中将是无意义的,因为它不会在查找数据方面添加任何内容。列中的值越独特,就越能从索引中获益。
Quote:"Having said that, you can clearly add a lot of pointless indexes to a table that won't do anything. Adding B-Tree indexes to a column with 2 distinct values will be pointless since it doesn't add anything in terms of looking the data up. The more unique the values in a column, the more it will benefit from an index."
如果只有两个不同的值,索引是否真的毫无意义?给定一个表如下(MySQL数据库,InnoDB)
Is an Index really pointless if there are only two distinct values? Given a table as follows (MySQL Database, InnoDB)
Id (BIGINT)
fullname (VARCHAR)
address (VARCHAR)
status (VARCHAR)
其他条件:
- 数据库包含300百万条记录
- 状态只能为已启用和已停用
- 150百万记录已启用状态= 150百万记录
stauts =已停用
如果没有状态索引,则其中status ='enabled'
的选择将导致具有300百万记录的完整tablecan处理?
My understanding is, without having an index on status, a select with where status=’enabled’
would result in a full tablescan with 300 Million Records to process?
当我对状态使用BTREE索引时,查找的效率如何?
How efficient is the lookup when I use a BTREE index on status?
我应该为此列建立索引吗?
Should I index this column or not?
MySQL InnoDB提供了什么替代方法(也许任何其他索引)来有效地查找由给定示例中的where status =enabled基数/选择性的值?
What alternatives (maybe any other indexes) does MySQL InnoDB provide to efficiently look records up by the "where status="enabled" clause in the given example with a very low cardinality/selectivity of the values?
推荐答案
你描述的索引几乎没有意义。
The index that you describe is pretty much pointless. An index is best used when you need to select a small number of rows in comparison to the total rows.
这样做的原因与数据库如何访问表格有关。可以通过全表扫描来评估表,其中每个块被依次读取和处理,或者通过rowid或键查找,其中数据库具有键/ rowid并读取其所需的确切行。
The reason for this is related to how a database accesses a table. Tables can be assessed either by a full table scan, where each block is read and processed in turn. Or by a rowid or key lookup, where the database has a key/rowid and reads the exact row it requires.
在使用基于主键或另一个唯一索引的where子句(例如 where id = 1
)的情况下,数据库可以使用索引来精确地引用行的数据存储位置,这显然比执行全表扫描和处理每个块更有效。
In the case where you use a where clause based on the primary key or another unique index, eg. where id = 1
, the database can use the index to get an exact reference to where the row's data is stored. This is clearly more efficient than doing a full table scan and processing every block.
现在回到你的例子,你有一个where子句 where status ='enabled'
,索引将返回150m行,数据库将不得不依次使用单独小读。而使用全表扫描访问表允许数据库使用更高效的更大的读取。
Now back to your example, you have a where clause of where status = 'enabled'
, the index will return 150m rows and the database will have to read each row in turn using separate small reads. Whereas accessing the table with a full table scan allows the database to make use of more efficient larger reads.
有一个点,最好只是做一个完整的表扫描而不是使用索引。使用mysql,您可以使用 FORCE INDEX(idx_name)
作为查询的一部分,以允许在每个表访问方法之间进行比较。
There is a point at which it is better to just do a full table scan rather than use the index. With mysql you can use FORCE INDEX (idx_name)
as part of your query to allow comparisons between each table access method.
参考文献:
这篇关于MySQL:低基数/选择性列=如何索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!