问题描述
我已经看到(这里:如何转换时间戳转换为DataFrame中的Date格式?)是一种将日期类型转换为时间戳的方法,但是,至少对我来说,这是行不通的.
I've seen (here: How to convert Timestamp to Date format in DataFrame?) the way to convert a timestamp in datetype, but,at least for me, it doesn't work.
这是我尝试过的:
# Create dataframe
df_test = spark.createDataFrame([('20170809',), ('20171007',)], ['date',])
# Convert to timestamp
df_test2 = df_test.withColumn('timestamp',func.when((df_test.date.isNull() | (df_test.date == '')) , '0')\
.otherwise(func.unix_timestamp(df_test.date,'yyyyMMdd')))\
# Convert timestamp to date again
df_test2.withColumn('date_again', df_test2['timestamp'].cast(stypes.DateType())).show()
但这会在 date_again
列中返回null:
But this returns null in the column date_again
:
+--------+----------+----------+
| date| timestamp|date_again|
+--------+----------+----------+
|20170809|1502229600| null|
|20171007|1507327200| null|
+--------+----------+----------+
有什么失败的主意吗?
推荐答案
以下内容:
func.when((df_test.date.isNull() | (df_test.date == '')) , '0')\
.otherwise(func.unix_timestamp(df_test.date,'yyyyMMdd'))
不起作用,因为它的类型不一致-第一个子句返回 string
,而第二个子句返回 bigint
.结果,如果 data
为 NOT NULL
并且不为空,它将始终返回 NULL
.
doesn't work because it is type inconsistent - the first clause returns string
while the second clause returns bigint
. As a result it will always return NULL
if data
is NOT NULL
and not empty.
它也已过时-SQL函数是 NULL
和格式错误的安全格式.无需其他检查.
It is also obsolete - SQL functions are NULL
and malformed format safe. There is no need for additional checks.
In [1]: spark.sql("SELECT unix_timestamp(NULL, 'yyyyMMdd')").show()
+----------------------------------------------+
|unix_timestamp(CAST(NULL AS STRING), yyyyMMdd)|
+----------------------------------------------+
| null|
+----------------------------------------------+
In [2]: spark.sql("SELECT unix_timestamp('', 'yyyyMMdd')").show()
+--------------------------+
|unix_timestamp(, yyyyMMdd)|
+--------------------------+
| null|
+--------------------------+
而且您在Spark 2.2或更高版本中不需要中间步骤:
And you don't need intermediate step in Spark 2.2 or later:
from pyspark.sql.functions import to_date
to_date("date", "yyyyMMdd")
这篇关于将时间戳转换为Spark数据框中的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!