本文介绍了在pyspark中将纬度和经度转换为UTM坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框包含每个点的经度和纬度坐标.我想将每个点的地理坐标转换为 UTM 坐标.

I have dataframe contain longitude and latitude coordinates for each point. I want to convert the geographical coordinates for each point to UTM coordinates.

我尝试使用 utm 模块 (https://pypi.org/project/utm/)

I tried to use utm module (https://pypi.org/project/utm/)

import utm
df=df.withColumn('UTM',utm.from_latlon(fn.col('lat'),fn.col('lon')))

但我收到此错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-39-8b21f98738ca> in <module>()
----> 1 df=df.withColumn('UTM',utm.from_latlon(fn.col('lat'),fn.col('lon')))

~\Anaconda3\lib\site-packages\utm\conversion.py in from_latlon(latitude, longitude, force_zone_number)
    152        .. _[1]: http://www.jaworski.ca/utmzones.htm
    153     """
--> 154     if not -80.0 <= latitude <= 84.0:
    155         raise OutOfRangeError('latitude out of range (must be between 80 deg S and 84 deg N)')
    156     if not -180.0 <= longitude <= 180.0:

F:\spark\spark\python\pyspark\sql\column.py in __nonzero__(self)
    633
    634     def __nonzero__(self):
--> 635         raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
    636                          "'~' for 'not' when building DataFrame boolean expressions.")
    637     __bool__ = __nonzero__

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

更新:

应用utm或pyproj函数创建udf后

After creating udf that applying utm or pyproj function

结果是:

+--------------------+
|                 UTM|
+--------------------+
|[Ljava.lang.Objec...|
|[Ljava.lang.Objec...|
|[Ljava.lang.Objec...|
|[Ljava.lang.Objec...|
|[Ljava.lang.Objec...|
+--------------------+
only showing top 5 rows

推荐答案

类似这样的事情,

import pyspark.sql.functions as F
import utm
from pyspark.sql.types import *

utm_udf_x = F.udf(lambda x,y: utm.from_latlon(x,y)[0], FloatType())
utm_udf_y = F.udf(lambda x,y: utm.from_latlon(x,y)[1], FloatType())

df = df.withColumn('UTM_x',utm_udf_x(F.col('lat'), F.col('lon')))
df = df.withColumn('UTM_y',utm_udf_y(F.col('lat'), F.col('lon')))

虽然我不知道你为什么在最后写了[1].

Although I am not sure why did you write [1] at the end.

这篇关于在pyspark中将纬度和经度转换为UTM坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-16 19:49