本文介绍了如何在 DataFrames 中将列类型从 String 更改为 Date?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,它有两列(C、D)被定义为字符串列类型,但列中的数据实际上是日期.例如,C 列的日期为01-APR-2015",D 列的日期为20150401",我想将这些更改为日期列类型,但我没有找到这样做的好方法.我查看了堆栈溢出,我需要将字符串列类型转换为 Spark SQL 的 DataFrame 中的日期列类型.日期格式可以是01-APR-2015",我看看 这篇文章 但它没有与日期相关的信息

I have a dataframe that have two columns (C, D) are defined as string column type, but the data in the columns are actually dates. for example column C has the date as "01-APR-2015" and column D as "20150401" I want to change these to date column type, but I didn't find a good way of doing that. I look at the stack overflow I need to convert the string column type to Date column type in Spark SQL's DataFrame. the date format can be "01-APR-2015" and I look at this post but it didn't have info relate to date

推荐答案

Spark >= 2.2

您可以使用to_date:

import org.apache.spark.sql.functions.{to_date, to_timestamp}

df.select(to_date($"ts", "dd-MMM-yyyy").alias("date"))

to_timestamp:

df.select(to_date($"ts", "dd-MMM-yyyy").alias("timestamp"))

使用中间unix_timestamp 调用.

火花

从 Spark 1.5 开始,您可以使用 unix_timestamp 函数将字符串解析为 long,将其转换为时间戳并截断 to_date:

Since Spark 1.5 you can use unix_timestamp function to parse string to long, cast it to timestamp and truncate to_date:

import org.apache.spark.sql.functions.{unix_timestamp, to_date}

val df = Seq((1L, "01-APR-2015")).toDF("id", "ts")

df.select(to_date(unix_timestamp(
  $"ts", "dd-MMM-yyyy"
).cast("timestamp")).alias("timestamp"))

注意:

根据 Spark 版本,您可能需要一些调整,因为 SPARK-11724:

Depending on a Spark version you this may require some adjustments due to SPARK-11724:

从整数类型转换为时间戳会将源整数视为以毫秒为单位.从时间戳转换为整数类型会在几秒钟内创建结果.

如果您使用未打补丁的版本 unix_timestamp 输出需要乘以 1000.

If you use unpatched version unix_timestamp output requires multiplication by 1000.

这篇关于如何在 DataFrames 中将列类型从 String 更改为 Date?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!