本文介绍了Spark Struded Streaming 自动将时间戳转换为本地时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的时间戳为 UTC 和 ISO8601,但使用结构化流,它会自动转换为本地时间.有没有办法阻止这种转换?我想在 UTC 中使用它.

I have my timestamp in UTC and ISO8601, but using Structured Streaming, it gets automatically converted into the local time. Is there a way to stop this conversion? I would like to have it in UTC.

我正在从 Kafka 读取 json 数据,然后使用 from_json Spark 函数解析它们.

I'm reading json data from Kafka and then parsing them using the from_json Spark function.

输入:

{"Timestamp":"2015-01-01T00:00:06.222Z"}

流程:

SparkSession
  .builder()
  .master("local[*]")
  .appName("my-app")
  .getOrCreate()
  .readStream()
  .format("kafka")
  ... //some magic
  .writeStream()
  .format("console")
  .start()
  .awaitTermination();

架构:

StructType schema = DataTypes.createStructType(new StructField[] {
        DataTypes.createStructField("Timestamp", DataTypes.TimestampType, true),});

输出:

+--------------------+
|           Timestamp|
+--------------------+
|2015-01-01 01:00:...|
|2015-01-01 01:00:...|
+--------------------+

如您所见,小时已自行增加.

As you can see, the hour has incremented by itself.

PS:我尝试使用 from_utc_timestamp Spark 函数进行实验,但没有成功.

PS: I tried to experiment with the from_utc_timestamp Spark function, but no luck.

推荐答案

对我来说,它可以使用:

For me it worked to use:

spark.conf.set("spark.sql.session.timeZone", "UTC")

它告诉 spark SQL 使用 UTC 作为时间戳的默认时区.例如,我在 spark SQL 中使用了它:

It tells the spark SQL to use UTC as a default timezone for timestamps. I used it in spark SQL for example:

select *, cast('2017-01-01 10:10:10' as timestamp) from someTable

我知道它在 2.0.1 中不起作用.但适用于 Spark 2.2.我也在 SQLTransformer 中使用过,并且它有效.

I know it does not work in 2.0.1. but works in Spark 2.2. I used in SQLTransformer also and it worked.

虽然我不确定流媒体.

这篇关于Spark Struded Streaming 自动将时间戳转换为本地时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 13:16
查看更多