我有一个带有不同坐标的列的DataFrame,聚集在其他列表中,如下所示:

    name    OBJECTID    geometry
0    NaN           1    ['-80.304852,-3.489302,0.0','-80.303087,-3.490214,0.0',...]

1    NaN           2    ['-80.27494,-3.496571,0.0',...]

2    NaN           3    ['-80.267987,-3.500003,0.0',...]


我想分离值并删除'0.0',但将它们保留在列表中以将它们添加到字典中的某个键中,如下所示:

    name    OBJECTID    geometry
0    NaN           1    [[-80.304852, -3.489302],[-80.303087, -3.490214],...]

1    NaN           2    [[-80.27494, -3.496571],...]

2    NaN           3    [[-80.267987, -3.500003],...]


这是我无法在for循环中将其分开的代码的工作方式:

import panda as pd
import numpy as np

r = pd.read_csv('data.csv')
rloc = np.asarray(r['geometry'])

r['latitude'] = np.zeros(r.shape[0],dtype= r['geometry'].dtype)
r['longitude'] = np.zeros(r.shape[0],dtype= r['geometry'].dtype)

# Separating the latitude and longitude values form each string.
for i in range(0, len(rloc)):
    for j in range(0, len(rloc[i])):
        coord = rloc[i][j].split(',')
        r['longitude'] = coord[0]
        r['latitude'] = coord[1]

r = r[['OBJECTID', 'latitude', 'longitude', 'name']]


编辑:结果不好,因为每个结果只打印出一个值。

  OBJECTID  latitude    longitude   name
0        1  -3.465566   -80.151633  NaN
1        2  -3.465566   -80.151633  NaN
2        3  -3.465566   -80.151633  NaN


额外的问题:如何将所有这些经度和纬度值添加到元组中以供geopy使用?像这样:

r['location'] = (r['latitude], r['longitude'])


因此,“几何”列将如下所示:

geometry
[(-80.304852, -3.489302),(-80.303087, -3.490214),...]

[(-80.27494, -3.496571),...]

[(-80.267987, -3.500003),...]


编辑:

数据首先看起来像这样(对于每一行):

<LineString><coordinates>-80.304852,-3.489302,0.0 -80.303087,-3.490214,0.0 ...</coordinates></LineString>


我使用正则表达式使用以下代码对其进行了修改:

geo = np.asarray(r['geometry']);
geo = [re.sub(re.compile('<.*?>'), '', string) for string in geo]


然后将其放置在数组中:

rv = [geo[i].split() for i in range(0,len(geo))]
r['geometry'] = np.asarray(rv)


当我调用r ['geometry']时,输出为:

0    [-80.304852,-3.489302,0.0, -80.303087,-3.49021...
1    [-80.27494,-3.496571,0.0, -80.271963,-3.49266,...
2    [-80.267987,-3.500003,0.0, -80.267845,-3.49789...
Name: geometry, dtype: object


r['geometry'][0]是:

 ['-80.304852,-3.489302,0.0',
 '-80.303087,-3.490214,0.0',
 '-80.302131,-3.491878,0.0',
 '-80.300763,-3.49213,0.0']

最佳答案

从玩具数据集中输入的熊猫解决方案:

df = pd.read_csv("test.txt")
   name  OBJECTID                                           geometry
0   NaN         1  ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1   NaN         2  ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2   NaN         3  ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...


现在转换为经度-纬度对的列:

#regex extraction of longitude latitude pairs
pairs = "(-?\d+.\d+,-?\d+.\d+)"
s = df["geometry"].str.extractall(pairs)
#splitting string into two parts, creating two columns for longitude latitude
s = s[0].str.split(",", expand = True)
#converting strings into float numbers - is this even necessary?
s[[0, 1]] = s[[0, 1]].apply(pd.to_numeric)
#creating a tuple from longitude/latitude columns
s["lat_long"] = list(zip(s[0], s[1]))
#placing the tuples as columns in original dataframe
df = pd.concat([df, s["lat_long"].unstack(level = -1)], axis = 1)


玩具数据集的输出:

   name  OBJECTID                                           geometry  \
0   NaN         1  ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1   NaN         2  ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2   NaN         3  ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...

               0              1              2
0  (-80.3, -3.4)  (-80.3, -3.9)  (-80.3, -3.9)
1   (80.2, -4.4)   (-81.3, 2.9)  (-80.7, -3.2)
2  (-80.1, -3.2)  (-80.8, -2.9)  (-80.1, -1.9)


或者,您可以将一列中的元组合并为一个列表:

s["lat_long"] = list(zip(s[0], s[1]))
#placing the tuples as a list into a column of the original dataframe
df["lat_long"] = s.groupby(level=[0])["lat_long"].apply(list)


现在输出:

   name  OBJECTID                                           geometry  \
0   NaN         1  ['-80.3,-3.4,0.0','-80.3,-3.9,0.0','-80.3,-3.9...
1   NaN         2  ['80.2,-4.4,0.0','-81.3,2.9,0.0','-80.7,-3.2,0...
2   NaN         3  ['-80.1,-3.2,0.0','-80.8,-2.9,0.0','-80.1,-1.9...

                                        lat_long
0  [(-80.3, -3.4), (-80.3, -3.9), (-80.3, -3.9)]
1    [(80.2, -4.4), (-81.3, 2.9), (-80.7, -3.2)]
2  [(-80.1, -3.2), (-80.8, -2.9), (-80.1, -1.9)]

关于python - 带有坐标的numpy数组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/48735367/

10-11 07:59