我在pyspark中有以下数据框:
Name | Seconds
|Enviar solicitud ...| 1415
|Analizar mapa de ...| 1209|
|Modificar solicit...| 591|
|Entregar servicio...|91049|
我希望将
seconds
列转换为日期或时间戳(希望是todate),我正在尝试使用以下功能def to_date(seconds=0):
dat = ''
if seconds == 0:
dat = '0'
if (seconds / 86400) >= 1:
day = (int(seconds / 86400))
seconds = (seconds - 86400 * int(seconds / 86400))
dat = f'{day}d '
if (seconds / 3600) >= 1:
hour = (int(seconds / 3600))
seconds = (seconds - 3600 * int(seconds / 3600))
dat = dat + f'{hour}hr '
if (seconds / 60) >= 1:
minutes = (int(seconds / 60))
dat = dat + f'{minutes}min'
else:
return '0min'
return dat
但是,没有简单的方法,例如pyspark中的Pandas
.apply(to_date)
,有没有实现我想要做的事情?预期的输出:
Analizar mapa de comparacion de presupuestos 1209 20min
Crear mapa de comparacion de presupuestos 12155 3hr 22min
Entregar servicios de bienes 91049 1d 1hr 17min
最佳答案
我认为,如果没有UDF,就可以实现这一点,它将更快,更可伸缩地处理大数据。试试这个,让我知道我的逻辑是否有漏洞。
from pyspark.sql import functions as F
from pyspark.sql.functions import when
df.withColumn("Minutes", F.round((F.col("Seconds")/60),2))\
.withColumn("Hours", F.floor((F.col("Minutes")/60)))\
.withColumn("hourmin", F.floor(F.col("Minutes")-(F.col("Hours").cast("int") * 60)))\
.withColumn("Days", F.floor((F.col("Hours")/24)))\
.withColumn("Days2", F.col("Days")*24)\
.withColumn("Time", F.when((F.col("Hours")==0) &(F.col("Days")==0), F.concat(F.col("hourmin"),F.lit("min"))).when((F.col("Hours")!=0)&(F.col("Days")==0), F.concat(F.col("Hours"),F.lit("hr "),F.col("hourmin"),F.lit("min"))).when(F.col("Days")!=0, F.concat(F.col("Days"),F.lit("d "),(F.col("Hours")-F.col("Days2")),F.lit("hr "),F.col("hourmin"),F.lit("min"))))\
.drop("Minutes","Hours","hourmin","Days","Days2")\
.show()
+-----------------+-------+---------------+
| Name|Seconds| Time|
+-----------------+-------+---------------+
| Enviar solicitud| 1209| 20min|
| Analizar mapa de| 12155| 3hr 22min|
|Entregar servicio| 91049| 1d 1hr 17min|
| example1| 1900| 31min|
| example2| 2500| 41min|
| example3|9282398|107d 10hr 26min|
+-----------------+-------+---------------+
关于python - 只需几秒钟即可将列转换为人类可读的持续时间,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60510855/