在PostgreSQL中正确处理TIME WITH TIME ZONE

本文介绍了在PostgreSQL中正确处理TIME WITH TIME ZONE的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一个表，该表中充满了另一个系统的旧版报告中的数据。该表的列反映了报告的相同结构。

We have a table that is filled with data from a legacy report of another system. The columns of that table reflect the same structure of the report.

以下是该表的缩写结构：

Here are a abbreviated structure of the table:

CREATE TABLE IF NOT EXISTS LEGACY_TABLE (
  REPORT_DATE DATE NOT NULL,
  EVENT_ID BIGINT PRIMARY KEY NOT NULL,
  START_HOUR TIMESTAMP WITHOUT TIME ZONE,
  END_HOUR TIME WITHOUT TIME ZONE,
  EXPECTED_HOUR TIME WITHOUT TIME ZONE
);

我们正在重构此表以处理不同客户的不同时区。新结构如下所示：

We are refactoring this table to deal with different time zones of different clients. The new structure would be something like:

CREATE TABLE IF NOT EXISTS LEGACY_TABLE (
  REPORT_DATE DATE NOT NULL,
  EVENT_ID BIGINT PRIMARY KEY NOT NULL,
  START_HOUR TIMESTAMP WITH TIME ZONE,
  END_HOUR TIME WITH TIME ZONE,
  EXPECTED_HOUR TIME WITH TIME ZONE
);

这些小时字段表示由REPORT_DATE列表示的一天中的特定时间点。我的意思是，每个TIME列都代表REPORT_DATE中指定的一天中的某个时刻。

These hour fields represents a specific point in time during the day represented by the REPORT_DATE column. What I mean by that is that every TIME column represents a moment during the day specified in REPORT_DATE.

要考虑的其他几点：

在从旧版系统收到的报告中，我们不知道START_HOUR为什么采用TIMESTAMP格式。

报表中的字段是根据客户端的时区来格式化的，因此要重构此表，我们需要结合以下时区客户端（我们有此信息）可以在UTC中正确插入时间戳记/时间。

但是现在问题来了。这些列的值用于在我们的系统中多次计算另一个值，如下所示：

But now to the problem. The value of these columns are used to compute another values multiple times in our system, something like the following:

START_HOUR - END_HOUR (the result of this operation is currently being casted to TIME WITHOUT TIME ZONE)
START_HOUR < END_HOUR
START_HOUR + EXPECTED_HOUR
EXPECTED_HOUR - END_HOUR
EXPECTED_HOUR < '05:00'

经过一些研究，我发现不建议使用<$ c类型$ c>带时区的时间（），现在我对重构该表以处理不同时区并处理我们需要的不同列操作的最佳方法感到困惑。

After some research I found that is not recommended to use the type TIME WITH TIME ZONE (Postgres time with time zone equality) and now I'm a bit confused about what is the best way to refactor this table to deal with different time zones and handle the different column operations that we need to.

除此之外，我已经知道可以安全地减去两个类型为 TIMESTAMP WITH TIME ZONE 的列。此减法操作考虑了DST的更改（），其他的如何？而从TIMESTAMP中减去时间呢？

Besides that, I already know that is safe to subtract two columns of type TIMESTAMP WITH TIME ZONE. This subtraction operation is taking into account DST changes (Subtracting two columns of type timestamp with time zone) but how about the others? And the one subtracting a TIME from a TIMESTAMP?.

关于表重构，我们应该使用 TIME WITH TIME ZONE 反正？我们是否应该继续使用 TIME而没有时区？或者最好完全忘记类型 TIME 并将日期和时间结合在一起，然后将列更改为 TIMESTAMP WITH TIME ZONE ？

And about the table refactoring, should we use TIME WITH TIME ZONE anyways? Should we continue using TIME WITHOUT TIME ZONE? Or is better to forget the type TIME altogether and combine the DATE with the TIME and change the columns to TIMESTAMP WITH TIME ZONE?

我认为这些问题是相关的，因为我们选择使用的新列类型将定义我们如何使用这些列。

I think these questions are related because the new column types we choose to use, will define how we operate with the columns.

推荐答案

您断言：

所以您从不越过同一行中的日期线。我建议保存1x 日期 3x 时间和时区（作为 text 或FK列）：

So you never cross the a dateline within the same row. I suggest to save 1x date 3x time and the time zone (as text or FK column):

CREATE TABLE legacy_table (
   event_id      bigint PRIMARY KEY NOT NULL
 , report_date   date NOT NULL
 , start_hour    time
 , end_hour      time
 , expected_hour time
 , tz            text  -- time zone
);

就像您已经找到的一样，。它不能处理DST规则（ d aylight s 具有 t ime）。

Like you already found, timetz (time with time zone) should generally be avoided. It cannot deal with DST rules (daylight saving time).

所以基本上就是您以前所拥有的。只需从 start_hour 中删除日期部分，那便是运费。只需将时间戳转换为时间即可截断日期。像：（timestamp'2018-03-25 1:00'）:: time

So basically what you already had before. Just drop the date component from start_hour, that's dead freight. Simply cast the timestamp to time to cut off the date. Like: (timestamp '2018-03-25 1:00')::time

tz 可以是构造，但是要可靠地处理不同的时区，最好只使用时区名称。您在名称 rel = nofollow noreferrer>系统目录 pg_timezone_names 。

tz can be any string accepted by the AT TIME ZONE construct, but to deal with different time zones reliably, it's best to use time zone names exclusively. Any name you find in the system catalog pg_timezone_names.

要优化存储，您可以收集允许在一个小的查询表中使用时区名称，并将 tz文本替换为 tz_id int参考my_tz_table 。

To optimize storage, you could collect allowed time zone names in a small lookup table and replace tz text with tz_id int REFERENCES my_tz_table.

有和没有DST的两个示例行：

Two example rows with and without DST:

INSERT INTO legacy_table VALUES
   (1, '2018-03-25', '1:00', '3:00', '2:00', 'Europe/Vienna')  -- sadly, with DST
 , (2, '2018-03-25', '1:00', '3:00', '2:00', 'Europe/Moscow'); -- Russians got rid of DST

出于表示目的或计算目的，您可以执行以下操作：

For representation purposes or calculations you can do things like:

SELECT (report_date + start_hour)    AT TIME ZONE tz AT TIME ZONE 'UTC' AS start_utc
     , (report_date + end_hour)      AT TIME ZONE tz AT TIME ZONE 'UTC' AS end_utc
     , (report_date + expected_hour) AT TIME ZONE tz AT TIME ZONE 'UTC' AS expected_utc
-- START_HOUR - END_HOUR
     , (report_date + start_hour) AT TIME ZONE tz
     - (report_date + end_hour)   AT TIME ZONE tz AS start_minus_end
FROM   legacy_table;

您可以创建一个或多个以根据需要随时显示字符串。该表用于存储您需要的信息。

You might create one or more views to readily display strings as needed. The table is for storing the information you need.

请注意括号！否则，由于。

Note the parentheses! Else the operator + would bind before AT TIME ZONE due to operator precedence.

并观察结果：

db<> fiddle

由于时间是在维也纳（和其他适用DST愚蠢规则的地方）进行操纵的，因此您可以令人惊讶的结果。

Since the time is manipulated in Vienna (and anywhere else where silly DST rules apply), you get "surprising" results.