我最近在我的数据库中提高了正常化的级别,从以下几个方面着手:
+--------------------------------------+
| state_changes |
+----+-------+-----------+------+------+
| ID | Name | Timestamp | Val1 | Val2 |
+----+-------+-----------+------+------+
| 0 | John | 17:19:01 | A | X |
| 1 | Bob | 17:19:02 | E | W |
| 2 | John | 17:19:05 | E | Y |
| 3 | John | 17:19:06 | B | Y |
| 4 | John | 17:19:12 | C | Z |
| 5 | John | 17:19:15 | A | Z |
+----+-------+-----------+------+------+
更像这样:
+-------------------------------+ +-------------------------------+
| state_changes_1 | | state_changes_2 |
+----+-------+-----------+------+ +----+-------------------+------+
| ID | Name | Timestamp | Val1 | | ID | Name | Timestamp | Val2 |
+----+-------+-----------+------+ +----+-------+-----------+------+
| 0 | John | 17:19:01 | A | | 0 | John | 17:19:01 | X |
| 1 | Bob | 17:19:02 | E | | 1 | Bob | 17:19:02 | W |
| 2 | John | 17:19:05 | E | | 2 | John | 17:19:05 | Y |
| 3 | John | 17:19:06 | B | | 3 | John | 17:19:06 | Y |
| 4 | John | 17:19:12 | C | | 4 | John | 17:19:12 | Z |
| 5 | John | 17:19:15 | A | | 5 | John | 17:19:15 | Z |
+----+-------+-----------+------+ +----+-------+-----------+------+
我现在如何编写查询来“压缩”两个值重复的结果表?
我想在考虑行唯一性时忽略
ID
字段;我想在考虑行唯一性时忽略
Timestamp
;但是字段必须是连续的(在
Name,Timestamp
顺序下)才能被认为是重复的。在本例中,结果应该是:
+-------------------------------+ +-------------------------------+
| state_changes_1 | | state_changes_2 |
+----+-------+-----------+------+ +----+-------+-----------+------+
| ID | Name | Timestamp | Val1 | | ID | Name | Timestamp | Val2 |
+----+-------+-----------+------+ +----+-------+-----------+------+
| 0 | John | 17:19:01 | A | | 0 | John | 17:19:01 | X |
| 1 | Bob | 17:19:02 | E | | 1 | Bob | 17:19:02 | W |
| 3 | John | 17:19:06 | B | | 2 | John | 17:19:05 | Y |
| 4 | John | 17:19:12 | C | | 4 | John | 17:19:12 | Z |
| 5 | John | 17:19:15 | A | +----+-------+-----------+------+
+----+-------+-----------+------+
我的表有几十亿行,所以我在寻找一些考虑到效率的东西;也就是说,我是一个现实的人,所以我很高兴查询需要一两个小时来运行(包括索引重建)。
最佳答案
我在mysql 5.1.58上尝试过这个,它似乎可以处理您的测试数据。
SET @name = NULL;
SET @val1 = NULL;
UPDATE state_changes_1
SET Val1 = IF(Name=@name AND Val1=@val1, NULL, (@val1:=Val1)),
Name = (@name:=Name)
ORDER BY Name, `Timestamp`;
DELETE FROM state_changes_1 WHERE Val1 IS NULL;
关于mysql - 标准化后压缩表格,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/7474565/