问题描述
我正在尝试将GMT时间戳字符串列转换为东部时区的时间戳列。我想考虑日光节约。
I'm trying to convert a column of GMT timestamp strings into a column of timestamps in Eastern timezone. I want to take daylight savings into account.
我的时间戳字符串列如下:
My column of timestamp strings look like this:
'2017-02-01T10:15:21+00:00'
我想出了如何在EST中将字符串列转换为时间戳记:
I figured out how to convert the string column into a timestamp in EST:
from pyspark.sql import functions as F
df2 = df1.withColumn('datetimeGMT', df1.myTimeColumnInGMT.cast('timestamp'))
df3 = df2.withColumn('datetimeEST', F.from_utc_timestamp(df2.datetimeGMT, "EST"))
但是时间不会随着夏时制而改变。有没有其他功能或某些功能可以通过转换时间戳来节省夏时制?
But the times don't change with daylight savings. Is there another function or something that accounts for daylight savings with converting the timestamps?
编辑:我想我已经明白了。在上面的from_utc_timestamp调用中,我需要使用 America / New_York而不是 EST:
df3 = df2.withColumn('datetimeET', F.from_utc_timestamp(df2.datetimeGMT, "America/New_York"))
推荐答案
我最终想出了答案,所以我想在这里添加它。我还认为此问题/答案很有价值,因为在发布问题之前搜索该问题时,我找不到关于节省日光火花的任何信息。我可能应该已经意识到我应该搜索底层的Java函数。
I ended up figuring out the answer, so I figured I would add it here. I also think that this question/answer is worthwhile because while I was searching for this issue before posting the question, I couldn't find anything about daylight savings for spark. I probably should have realized that I should search for the underlying java functions.
问题的答案最终是使用字符串 America / New_York而不是美东时间。
The answer to the question ended up being to use the string "America/New_York" instead of "EST". This correctly applies daylight savings.
from pyspark.sql import functions as F
df3 = df2.withColumn('datetimeET', F.from_utc_timestamp(df2.datetimeGMT, "America/New_York"))
编辑:
此链接显示可用时区字符串的列表,可以按以下方式使用:
This link shows a list of available time zone strings that can be used in this way: https://garygregory.wordpress.com/2013/06/18/what-are-the-java-timezone-ids/
这篇关于Spark:将GMT时间戳转换为东部时间并考虑夏令时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!