对于一个200M船舶GPS(纬度,经度)坐标的数据集,我想计算一个到最近的陆地或海岸线的近似距离,作为名为distance_to_shore的函数,该函数将返回该海岸的距离和所在国家/地区。

我正在使用来自http://www.naturalearthdata.com/的国家边界和海岸线的形状文件

一些考虑因素是不可访问的海洋极点为2688公里。因此,这将是离海岸的最大可能距离,可用于创建某种边界框。我想计算地球的曲率(不是欧几里得),例如Haversine或Vincenty方法。

为此,我开始查看scipy.spatial.cKDTree,但这不适用于Haversine距离度量。另一方面,sklearn.neighbors.BallTree确实允许使用Haversine距离度量标准,但我无法使其正常工作。这是我到目前为止的代码。 N.B.理想情况下,应将函数向量化。


解决方案


感谢您的所有输入,这就是我在Python中解决的方法,包括下载相关形状文件的功能,需要一些清洁

import os
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

import shapely as sp
import cartopy.io.shapereader as shpreader
import ssl
import urllib.request
import zipfile

from shutil import rmtree
from dbfread import DBF
from scipy import spatial
from sklearn.neighbors import NearestNeighbors, BallTree
from pyproj import Proj, transform

from math import *

coastline = np.load(os.path.join(os.path.dirname(__file__),
                    '../data/shape_files/coast_coords_10m.npy'))

ports = np.load(os.path.join(os.path.dirname(__file__),
                '../data/shape_files/ports_coords.npy'))

def extract_geom_meta(country):
    '''
    extract from each geometry the name of the country
    and the geom_point data. The output will be a list
    of tuples and the country name as the last element.
    '''
    geoms = country.geometry
    coords = np.empty(shape=[0, 2])
    for geom in geoms:
        coords = np.append(coords, geom.exterior.coords, axis = 0)

    country_name = country.attributes["ADMIN"]
    return [coords, country_name]

def save_coastline_shape_file():
    '''
    store shp files locally, this functions will download
    shapefiles for the whole planet.
    '''
    ne_earth = shpreader.natural_earth(resolution = '10m',
                                       category = 'cultural',
                                       name='admin_0_countries')
    reader = shpreader.Reader(ne_earth)
    countries = reader.records()
    # extract and create separate objects
    world_geoms = [extract_geom_meta(country) for country in countries]
    coords_countries = np.vstack([[np.array(x[:-1]), x[-1]]
                                    for x in world_geoms])
    coastline = np.save(os.path.join(os.path.dirname(__file__),
                        '../data/shape_files/coast_coords_10m.npy')
                        , coords_countries)
    print('Saving coordinates (...)')

def distance_to_shore(lon, lat):
    '''
    This function will create a numpy array of distances
    to shore. It will contain and ID for AIS points and
    the distance to the nearest coastline point.
    '''
    coastline_coords = np.vstack([np.flip(x[0][0], axis=1) for x in coastline])
    countries = np.hstack([np.repeat(str(x[1]), len(x[0][0])) for x in coastline])
    tree = BallTree(np.radians(coastline_coords), metric='haversine')
    coords = pd.concat([np.radians(lat), np.radians(lon)], axis=1)
    dist, ind = tree.query(coords, k=1)
    df_distance_to_shore = pd.Series(dist.flatten()*6371, name='distance_to_shore')
    df_countries = pd.Series(countries[ind].flatten(), name='shore_country')
    return pd.concat([df_distance_to_shore, df_countries], axis=1)

最佳答案

解决此问题的有效方法是存储您的所有海岸
用测地线距离指向vantage point tree
您的指标(指标必须满足
triangle inequality)。然后,您可以为每艘船查询船长
树找到封闭点。
如果有M个海岸点和N个船只。然后时间到
构造VP树需要M log M距离的计算。每个
查询需要对数M距离计算。距离计算
椭球大约需要2.5μs。所以总时间是
(M + N)对数M×2.5μs。
这是使用我的库GeographicLib(版本1.47或更高版本)的代码
进行此计算。这只是的精简版本
NearestNeighbor class给出的示例。

// Example of using the GeographicLib::NearestNeighbor class.  Read lon/lat
// points for coast from coast.txt and lon/lat for vessels from vessels.txt.
// For each vessel, print to standard output: the index for the closest point
// on coast and the distance to it.

// This requires GeographicLib version 1.47 or later.

// Compile/link with, e.g.,
// g++ -I/usr/local/include -lGeographic -L/usr/local/bin -Wl,-rpath=/usr/local/lib -o coast coast.cpp

// Run time for 30000 coast points and 46217 vessels is 3 secs.

#include <iostream>
#include <exception>
#include <vector>
#include <fstream>

#include <GeographicLib/NearestNeighbor.hpp>
#include <GeographicLib/Geodesic.hpp>

using namespace std;
using namespace GeographicLib;

// A structure to hold a geographic coordinate.
struct pos {
    double _lat, _lon;
    pos(double lat = 0, double lon = 0) : _lat(lat), _lon(lon) {}
};

// A class to compute the distance between 2 positions.
class DistanceCalculator {
private:
    Geodesic _geod;
public:
    explicit DistanceCalculator(const Geodesic& geod) : _geod(geod) {}
    double operator() (const pos& a, const pos& b) const {
        double d;
        _geod.Inverse(a._lat, a._lon, b._lat, b._lon, d);
        if ( !(d >= 0) )
            // Catch illegal positions which result in d = NaN
            throw GeographicErr("distance doesn't satisfy d >= 0");
        return d;
    }
};

int main() {
    try {
        // Read in coast
        vector<pos> coast;
        double lat, lon;
        {
            ifstream is("coast.txt");
            if (!is.good())
                throw GeographicErr("coast.txt not readable");
            while (is >> lon >> lat)
                coast.push_back(pos(lat, lon));
            if (coast.size() == 0)
                throw GeographicErr("need at least one location");
        }

        // Define a distance function object
        DistanceCalculator distance(Geodesic::WGS84());

        // Create NearestNeighbor object
        NearestNeighbor<double, pos, DistanceCalculator>
            coastset(coast, distance);

        ifstream is("vessels.txt");
        double d;
        int count = 0;
        vector<int> k;
        while (is >> lon >> lat) {
            ++count;
            d = coastset.Search(coast, distance, pos(lat, lon), k);
            if (k.size() != 1)
                    throw GeographicErr("unexpected number of results");
            cout << k[0] << " " << d << "\n";
        }
    }
    catch (const exception& e) {
        cerr << "Caught exception: " << e.what() << "\n";
        return 1;
    }
}
此示例在C++中。要使用python,您需要找到一个python
VP树的实现,然后您可以使用
python version of GeographicLib用于距离计算。
P.S. GeographicLib使用精确的测地距离算法
满足三角形不等式Vincenty方法无法
收敛于几乎对映点,因此不满足三角形
不等式。
ADDENDUM :这是python的实现:
安装vptree和geographiclib
pip install vptree geographiclib
海岸点(lon,lat)在Coast.txt中;船只位置(lon,lat)是
在vessel.txt中。运行
import numpy
import vptree
from geographiclib.geodesic import Geodesic

def geoddist(p1, p2):
  # p1 = [lon1, lat1] in degrees
  # p2 = [lon2, lat2] in degrees
  return Geodesic.WGS84.Inverse(p1[1], p1[0], p2[1], p2[0])['s12']

coast = vptree.VPTree(numpy.loadtxt('coast.txt'), geoddist)
print('vessel closest-coast dist')
for v in numpy.loadtxt('vessels.txt'):
  c = coast.get_nearest_neighbor(v)
  print(list(v), list(c[1]), c[0])
对于30000个海岸点和46217艘船,这需要18分3秒。
这比我预期的要长。构造树的时间是
1分16秒。因此,总时间应为3分钟左右。
对于30000个海岸点和46217艘船,这需要4分钟(使用
vptree 1.1.1版)。
为了进行比较,使用GeographicLib C++库的时间为3
秒。
稍后:我研究了为什么python vptree速度慢。的数量
设置树的距离计算与GeographicLib的相同
C++实现和python vptree包:387248,大约M
日志M,对于M =30000。(此处的日志以2为底,我设置了存储桶
两种实现的大小均设为1,以简化比较。)
C++每次船只查找的距离计算次数
实现为14.7,接近预期值,log M =
14.9。但是python实现的等效统计是
108.9,系数为7.4较大。
各种因素都会影响VP树的效率:
有利位置,搜索的顺序等。有关这些的讨论
here给出了有关GeographicLib实现的注意事项。
我将对此ping python软件包的作者。
稍后发布:我已经提交了pull request来治愈专业
python包vptree的效率问题。的CPU时间
我的测试现在大约是4分钟。每个查询的距离计算数量为
16.7(接近GeographicLib::NearestNeighbor的数字,14.7)。

关于python - 计算船只到岸或海岸线的距离,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/45440700/

10-11 19:37
查看更多