我有一个相当稳定的有向图,其阶数为〜100k顶点,大小为〜1k边。它是二维的,因为它的顶点可以通过一对整数(x, y)(基数为〜100 x〜1000)来标识,并且所有边缘都严格按照x进行递增。

此外,还有与每个顶点关联的〜1k (key, val)对的字典。

我目前将图形存储在MySQL数据库中的三个(InnoDB)表中:一个顶点表(我认为这与我的问题无关,因此,我省略了同时包含它和所引用的外键约束的信息它在下面的我的摘录中);存放字典的表格;比尔·卡文(Bill Karwin) Eloquent 地描述了一个连接顶点的“闭合表”。

顶点字典表的定义如下:

CREATE TABLE `VertexDictionary` (
  `x`   smallint(6) unsigned NOT NULL,
  `y`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  `val` smallint(1) DEFAULT NULL,
  PRIMARY KEY (`x`, `y`  , `key`),
  KEY  `dict` (`x`, `key`, `val`)
);

连接顶点的闭合表为:
CREATE TABLE `ConnectedVertices` (
  `tail_x` smallint(6) unsigned NOT NULL,
  `tail_y` smallint(6) unsigned NOT NULL,
  `head_x` smallint(6) unsigned NOT NULL,
  `head_y` smallint(6) unsigned NOT NULL,
  PRIMARY KEY   (`tail_x`, `tail_y`, `head_x`),
  KEY `reverse` (`head_x`, `head_y`, `tail_x`),
  KEY `fx` (`tail_x`, `head_x`),
  KEY `rx` (`head_x`, `tail_x`)
);

还有一个(x, key)对的字典,以便对于每个这样的对,用该x标识的所有顶点在其字典中都具有该key的值。该词典存储在第四张表中:
CREATE TABLE `SpecialKeys` (
  `x`   smallint(6) unsigned NOT NULL,
  `key` varchar(50) NOT NULL DEFAULT '',
  PRIMARY KEY (`x`),
  KEY `xkey`  (`x`, `key`)
);

我经常希望提取具有特定x=X的所有顶点的字典中使用的键集,以及连接到左侧的任何SpecialKeys的关联值:
SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
WHERE
  `v`.`x` = X
;

EXPLAIN输出为:

id select_type表的类型possible_keys键key_len ref行额外
1 SIMPLE k index PRIMARY,xkey xkey 154 NULL 40使用索引;使用临时
1 SIMPLE c ref PRIMARY,reverse,fx,rx PRIMARY 2 db.k.x 1使用位置
1 SIMPLE v ref PRIMARY,dict PRIMARY 4 const,db.c.head_y 136使用索引
1 SIMPLE u eq_ref PRIMARY,dict PRIMARY 156 db.c.tail_x,db.c.tail_y,db.k.key 1在哪里使用

但是此查询大约需要10秒钟才能完成。我一直在想办法改善问题,但仍无济于事。

可以改善查询,还是应该考虑其他数据结构?非常感谢您的想法!

更新

尽管我确实重建了表并发现EXPLAIN的输出略有不同(尽管如上所示,但从v提取的行数已从1增至136!),我仍然对此一无所知。查询仍然需要10秒钟的时间才能执行。

我真的不明白这是怎么回事。获取所有(x, y, SpecialValue)和所有(x, y, key)元组的查询都非常快(分别为〜30ms和〜150ms),但是本质上将两者结合起来要比它们的合并时间长50倍以上...我如何改善执行该结合所需的时间?

下面的SHOW VARIABLES LIKE '%innodb%';的输出:

Variable_name值
-------------------------------------------------- ----------
have_innodb是
ignore_builtin_innodb ON
innodb_adaptive_flushing开启
innodb_adaptive_hash_index开
innodb_additional_mem_pool_size 2097152
innodb_autoextend_increment 8
innodb_autoinc_lock_mode 1
innodb_buffer_pool_size 1179648000
innodb_change_buffering插入
innodb_checksums开启
innodb_commit_concurrency 0
innodb_concurrency_tickets 500
innodb_data_file_path ibdata1:10M:autoextend
innodb_data_home_dir/rdsdbdata/db/innodb
innodb_doublewrite开
innodb_fast_shutdown 1
innodb_file_format羚羊
innodb_file_format_check梭子鱼
innodb_file_per_table开启
innodb_flush_log_at_trx_commit 1
innodb_flush_method O_DIRECT
innodb_force_recovery 0
innodb_io_capacity 200
innodb_lock_wait_timeout 50
innodb_locks_unsafe_for_binlog关闭
innodb_log_buffer_size 8388608
innodb_log_file_size 134217728
innodb_log_files_in_group 2
innodb_log_group_home_dir/rdsdbdata/log/innodb
innodb_max_dirty_pages_pct 75
innodb_max_purge_lag 0
innodb_mirrored_log_groups 1
innodb_old_blocks_pct 37
innodb_old_blocks_time 0
innodb_open_files 300
innodb_read_ahead_threshold 56
innodb_read_io_threads 4
innodb_replication_delay 0
innodb_rollback_on_timeout关闭
innodb_spin_wait_delay 6
innodb_stats_method nulls_equal
innodb_stats_on_metadata开启
innodb_stats_sample_pages 8
innodb_strict_mode关闭
innodb_support_xa开启
innodb_sync_spin_loops 30
innodb_table_locks开启
innodb_thread_concurrency 0
innodb_thread_sleep_delay 10000
innodb_use_sys_malloc开启
innodb_version 1.0.16
innodb_write_io_threads 4

最佳答案

如果不花时间测试它,您是否提供了不完整的示例?
您绝对应该尝试对联接表的重新排序。解释输出提供了一些信息,比如说按key_len排序应该是启发式最快的。我认为,如果优化程序无法确定要过滤的第一个表,则应将其列为最后一个。

因此,假设“c,v,k,u”顺序是最好的。

SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `VertexDictionary`  AS `v`
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
           AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  `v`.`x` = X
;

“行”将建议“c/u,k,v”顺序,但这取决于数据:
SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
  `VertexDictionary`  AS `u`
  JOIN `VertexDictionary`  AS `v`
  JOIN `SpecialKeys`       AS `k` ON (`k`.`x`, `k`.`key`) = (`u`.`x`, `u`.`key`)
  JOIN `ConnectedVertices` AS `c` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
                                 AND (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
 WHERE
  `v`.`x` = X
;

希望这可以帮助。

更新(避免varchar连接):
SELECT DISTINCT
  `v`.`key`,
  `u`.`val`
FROM
       `ConnectedVertices` AS `c`
  JOIN `VertexDictionary`  AS `u` ON (`u`.`x`, `u`.`y`  ) = (`c`.`tail_x`, `c`.`tail_y`)
  JOIN `VertexDictionary`  AS `v` ON (`v`.`x`, `v`.`y`  ) = (`c`.`head_x`, `c`.`head_y`)
WHERE
  (`u`.`x`, `u`.`key`) IN (SELECT `k`.`x`, `k`.`key` FROM `SpecialKeys` AS `k`)
AND
  `v`.`x` = X
;

关于mysql - 跨分层数据优化MySQL查询,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/10211029/

10-11 18:35