Pyspark 日期 yyyy-mmm-dd 转换

本文介绍了Pyspark 日期 yyyy-mmm-dd 转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一个火花数据框.其中一个 col 的日期格式为 2018-Jan-12

Have a spark data frame . One of the col has dates populated in the format like 2018-Jan-12

我需要将此结构更改为 20180112

I need to change this structure to 20180112

如何实现这一点

推荐答案

对于 Spark 1.5+ 版本

假设您有以下 DataFrame:

Suppose you had the following DataFrame:

df = sqlCtx.createDataFrame([("2018-Jan-12",)], ["date_str"])
df.show()
#+-----------+
#|   date_str|
#+-----------+
#|2018-Jan-12|
#+-----------+

为了避免使用udfs，你可以先转换字符串到日期:

To avoid using udfs, you can first convert the string to a date:

from pyspark.sql.functions import from_unixtime, unix_timestamp
df = df.withColumn('date', from_unixtime(unix_timestamp('date_str', 'yyyy-MMM-dd')))
df.show()
#+-----------+-------------------+
#|   date_str|               date|
#+-----------+-------------------+
#|2018-Jan-12|2018-01-12 00:00:00|
#+-----------+-------------------+

然后将日期格式化为字符串以您想要的格式:

from pyspark.sql.functions import date_format, col
df = df.withColumn("new_date_str", date_format(col("date"), "yyyyMMdd"))
df.show()
#+-----------+-------------------+------------+
#|   date_str|               date|new_date_str|
#+-----------+-------------------+------------+
#|2018-Jan-12|2018-01-12 00:00:00|    20180112|
#+-----------+-------------------+------------+

或者，如果您愿意，可以将它们链接在一起并跳过中间步骤:

Or if you prefer, you can chain it all together and skip the intermediate steps:

import pyspark.sql.functions as f
df.select(
    f.date_format(
        f.from_unixtime(
            f.unix_timestamp(
                'date_str',
                'yyyy-MMM-dd')
        ),
        "yyyyMMdd"
    ).alias("new_date_str")
).show()
#+------------+
#|new_date_str|
#+------------+
#|    20180112|
#+------------+

这篇关于Pyspark 日期 yyyy-mmm-dd 转换的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！