How do I best go about fixing this issue?I would also like to find the RMSE/mean, mean absolute error, mean absolute error/mean, median absolute error, and Median Percent Error, but once I figure out how to calculate one, I should be good on the others.推荐答案在这种情况下,我认为您不需要 udf.我认为只使用 pyspark.sql.functions 是可能的.I don't think you need a udf in that case. I think it is possible by using only pyspark.sql.functions.我可以向您推荐以下未经测试的选项I can propose you the following untested optionimport pyspark.sql.functions as psfrmse = old_df.withColumn("squarederror", psf.pow(psf.col("col1") - psf.col("col2"), psf.lit(2) )) .agg(psf.avg(psf.col("squarederror")).alias("mse")) .withColumn("rmse", psf.sqrt(psf.col("mse")))rmse.collect()使用相同的逻辑,您可以获得其他性能统计数据Using the same logic, you can get other performance statistics 这篇关于Pyspark - 使用数据框中其他两列的 RMSE 创建新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
06-27 22:50