我是SQL的新手,我有一个问题。
我有这个SQL代码:
DROP TABLE if exists students;
DROP TABLE if exists grades;
CREATE TABLE students(
s_id integer NOT NULL PRIMARY KEY,
s_name text,
s_last_name text,
curr_year integer
);
CREATE TABLE grades(
s_id integer NOT NULL PRIMARY KEY,
course text,
c_year integer,
grade integer,
FOREIGN KEY (s_id) REFERENCES students
);
INSERT INTO students (s_id, s_name, s_last_name, curr_year)
VALUES (1, 'A', 'S', 3);
INSERT INTO students (s_id, s_name, s_last_name, curr_year)
VALUES (2, 'A', 'A', 2);
INSERT INTO students (s_id, s_name, s_last_name, curr_year)
VALUES (3, 'V', 'B', 1);
INSERT INTO students (s_id, s_name, s_last_name, curr_year)
VALUES (4, 'K', 'N', 2);
INSERT INTO grades (s_id, course, c_year, grade)
VALUES (1, 'DB', 2, 98);
INSERT INTO grades (s_id, course, c_year, grade)
VALUES (2, 'OS', 3, 90);
INSERT INTO grades (s_id, course, c_year, grade)
VALUES (3, 'DB', 2, 94);
EXPLAIN ANALYZE
SELECT *
FROM students s JOIN grades gr
ON s.s_id = gr.s_id
WHERE curr_year > 0;
CREATE INDEX student_details ON students(s_id, s_name, s_last_name);
CREATE INDEX student_courses ON grades(s_id, course);
EXPLAIN ANALYZE
SELECT *
FROM students s JOIN grades gr
ON s.s_id = gr.s_id
WHERE curr_year > 0;
DROP INDEX student_details;
DROP INDEX student_courses;
DROP TABLE students CASCADE;
DROP TABLE grades CASCADE;
我尝试了解解释输出。
在创建INDEX之前,我进行了哈希合并。这是说明输出:
Hash Join (cost=23.50..51.74 rows=270 width=116) (actual time=0.039..0.050 rows=3 loops=1)
Hash Cond: (gr.s_id = s.s_id)
-> Seq Scan on grades gr (cost=0.00..21.30 rows=1130 width=44) (actual time=0.005..0.008 rows=3 loops=1)
-> Hash (cost=20.12..20.12 rows=270 width=72) (actual time=0.021..0.021 rows=4 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on students s (cost=0.00..20.12 rows=270 width=72) (actual time=0.006..0.011 rows=4 loops=1)
Filter: (curr_year > 0)
Planning time: 0.147 ms
Execution time: 0.089 ms
(9 rows)
在创建INDEX之后,我得到了嵌套循环:
Nested Loop (cost=0.00..2.12 rows=1 width=116) (actual time=0.031..0.116 rows=3 loops=1)
Join Filter: (s.s_id = gr.s_id)
Rows Removed by Join Filter: 9
-> Seq Scan on students s (cost=0.00..1.05 rows=1 width=72) (actual time=0.012..0.018 rows=4 loops=1)
Filter: (curr_year > 0)
-> Seq Scan on grades gr (cost=0.00..1.03 rows=3 width=44) (actual time=0.003..0.009 rows=3 loops=4)
Planning time: 0.396 ms
Execution time: 0.170 ms
但是我不知道为什么吗?在我的案例中,为什么索引使嵌套循环比哈希联接更受青睐?
我很乐意对此进行解释。
非常感谢!
最佳答案
“嵌套循环”连接有点用词不当。从技术上讲,它是指嵌套循环-顾名思义,它是:
for every row in table1
for every row in table2
compare values and execute join
实际上,这通常用作:
for every row in table1
for every matching row in table2
execute join
区别是细微的,但是“匹配”意味着嵌套循环联接可以使用索引。因此,嵌套循环联接的性能可能很差(如果表相对较大并且没有索引),或者性能可能非常好(如果它可以利用索引)。
关于sql - 哈希联接与嵌套循环,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48265705/