我是pyspark的新手,并且遇到以下问题:
我正在尝试做的是:
我需要将UTM区域10中的坐标转换为纬度和经度。我正在尝试在数据帧上执行此操作,并已完成以下操作来实现该目的。下面的代码是引用另一个帖子创建的
Converting latitude and longitude to UTM coordinates in pyspark
import utm
from pyspark.sql import SparkSession, functions, types, udf
from pyspark.sql.types import FloatType, DoubleType, StringType
spark = SparkSession.builder.appName('reddit average df').getOrCreate()
a = spark.createDataFrame([{"X": 488769.792012, "Y": 5457280.44999}])
a.show()
| X| Y|
+-------------+-------------+
|488769.792012|5457280.44999|
+-------------+-------------+
utm_udf_x = functions.udf(lambda x, y: utm.to_latlon(x, y, 10, 'U')[0], DoubleType())
c = a.withColumn('Latitude', utm_udf_x(functions.col('X'), functions.col('Y')))
c.show()
但是,在这样做时,我面临以下问题(此处粘贴的前几行错误):
19/11/13 11:15:33 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 7)
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.dtype)
at net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:707)
at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:175)
at net.razorvine.pickle.Unpickler.load(Unpickler.java:99)
at net.razorvine.pickle.Unpickler.loads(Unpickler.java:112)
at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$evaluate$1.apply(BatchEvalPythonExec.scala:90)
at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$evaluate$1.apply(BatchEvalPythonExec.scala:89)
我尝试过更改数据类型,假设这可能是问题所在。但是我可以推断出类型是相同的。赞赏是否有人可以帮助我
最佳答案
utm.to_latlon返回一个numpy对象,该对象不能隐式转换为pysparks DoubleType:
type(utm.to_latlon(488769.792012 , 5457280.44999, 10, 'U')[0])
#Output
#numpy.float64
只需调用
.item()
即可获取可转换为pysparks DoubleType的普通python float对象:utm_udf_x = functions.udf(lambda x, y: utm.to_latlon(x, y, 10, 'U')[0].item(), DoubleType())
关于python - 无法使用pyspark数据帧将utm转换为latlong,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58844233/