在数据库中,我有一个包含订单项的表。该表包含大约3600万条记录。
像这样运行查询大约需要3分钟:
SELECT COUNT(DISTINCT DATE(created_on), product_id) FROM order_items;
像这样运行查询大约需要13秒钟:
SELECT COUNT(1) FROM order_items;
某事告诉我,3600万条记录并不多,而且两个查询的运行速度都很慢。
在这里开始研究性能问题的清单将是什么?
我们正在使用MySQL(实际上是它的Clustrix版本
MySQL 5.0.45-clustrix-6.0.1
)。编辑。添加更多信息:
/* SHOW CREATE TABLE order_items; */
CREATE TABLE `order_items` (
`id` int(10) unsigned not null AUTO_INCREMENT,
`state` enum('pending','sold_out','approved','declined','cancelled','processing','completed','expired') CHARACTER SET utf8 not null default 'pending',
`order_id` int(10) unsigned not null,
`product_id` int(10) unsigned not null,
`quantity` smallint(5) unsigned not null,
`price` decimal(10,2) unsigned not null,
`total` decimal(10,2) unsigned not null,
`created_on` datetime not null,
`updated_on` datetime not null,
`employee_id` int(11),
`customer_id` int(11) unsigned not null,
PRIMARY KEY (`id`) /*$ DISTRIBUTE=1 */,
KEY `updated_on` (`updated_on`) /*$ DISTRIBUTE=1 */,
KEY `state` (`state`,`quantity`) /*$ DISTRIBUTE=3 */,
KEY `product_id` (`product_id`,`state`) /*$ DISTRIBUTE=2 */,
KEY `product` (`product_id`) /*$ DISTRIBUTE=1 */,
KEY `order_items_quantity` (`quantity`) /*$ DISTRIBUTE=2 */,
KEY `order_id` (`order_id`,`state`,`created_on`) /*$ DISTRIBUTE=3 */,
KEY `order` (`order_id`) /*$ DISTRIBUTE=1 */,
KEY `index_order_items_on_employee_id` (`employee_id`) /*$ DISTRIBUTE=2 */,
KEY `customer_id` (`customer_id`) /*$ DISTRIBUTE=2 */,
KEY `created_at` (`created_on`) /*$ DISTRIBUTE=1 */,
) AUTO_INCREMENT=36943352 CHARACTER SET utf8 ENGINE=InnoDB /*$ REPLICAS=2 SLICES=12 */
和:
/* SHOW VARIABLES LIKE '%buffer%'; */
+----------------------------------------+-------+
| Variable_name | Value |
+----------------------------------------+-------+
| backup_compression_buffer_size_bytes | 8192 |
| backup_read_buffer_size_bytes | 8192 |
| backup_write_buffer_size_bytes | 8192 |
| mysql_master_trx_buffer_kb | 256 |
| mysql_slave_session_buffer_size_events | 100 |
| net_buffer_length | 16384 |
| replication_master_buffer_kb | 65536 |
+----------------------------------------+-------+
编辑2。这是两个查询的
EXPLAIN
语句:mysql> EXPLAIN SELECT COUNT(1) FROM order_items;
+----------------------------------------------------------+-------------+-------------+
| Operation | Est. Cost | Est. Rows |
+----------------------------------------------------------+-------------+-------------+
| row_count "expr1" | 29740566.81 | 1.00 |
| stream_combine | 26444732.70 | 32958341.10 |
| compute expr0 := param(0) | 1929074.80 | 2746528.43 |
| filter isnotnull(param(0)) | 1915342.16 | 2746528.43 |
| index_scan 1 := order_items.order_items_quantity | 1854308.19 | 3051698.25 |
+----------------------------------------------------------+-------------+-------------+
5 rows in set (0.13 sec)
和:
mysql> EXPLAIN SELECT COUNT(DISTINCT DATE(created_on), product_id) FROM order_items;
+----------------------------------------------------------------------------------+-------------+------------+
| Operation | Est. Cost | Est. Rows |
+----------------------------------------------------------------------------------+-------------+------------+
| hash_aggregate_combine expr1 := count(DISTINCT (0 . "expr0"),(1 . "product_id")) | 10115923.36 | 4577547.38 |
| hash_aggregate_partial GROUPBY((0 . "expr0"), (1 . "product_id")) | 3707357.04 | 4577547.38 |
| compute expr0 := cast(1.created_on, date) | 2166388.20 | 3051698.25 |
| index_scan 1 := order_items.__idx_order_items__PRIMARY | 2151129.71 | 3051698.25 |
+----------------------------------------------------------------------------------+-------------+------------+
4 rows in set (0.24 sec)
最佳答案
第一个查询必须遍历整个数据库,检查表中的每一行。在created_on和product_id上的索引可能会大大加快索引的速度。如果您不了解索引,http://use-the-index-luke.com是一个很好的起点。
在我看来,第二个查询应该是即时的,因为它只需要检查表元数据,而不需要检查任何行。
关于mysql - 数据库性能 list ,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/31329382/