问题描述
我正在尝试解析知名二进制文件的二进制编码地理信息系统(GIS)中使用的几何对象.我正在使用此ESRI规范(相同结果此处来自esri ).我从渗透解析OpenStreetMap数据的工具(特别是 pgsimp-dump格式,该格式为十六进制二进制的表示形式.
I am trying to parse Well Known Binary a binary encoding of geometry objects used in Geographic Information Systems (GIS). I am using this spec from ESRI (same results here from esri). I have input data from Osmosis a tool to parse OpenStreetMap data, specifically the pgsimp-dump format which gives the hex represenation of the binary.
ESRI文档说,Point
只能有21个字节,字节顺序应只有1个字节,typeid的uint32应该有4个字节,双精度x时应为8个,双精度y时应为8个.
The ESRI docs say that there should only be 21 bytes for a Point
, 1 byte for byte order, 4 for uint32 for typeid, and 8 for double x and 8 for double y.
这个渗透(十六进制)示例就是一个例子:0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40
,它长25个字节.
An example from osmosis is this (hex) example: 0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40
, which is 25 bytes long.
Shapely 一个基于W语言库C的python程序,用于解析WKB(等), GEOS 能够解析该字符串:
Shapely a python programme to parse WKB (etc), which is based on the popular C library GEOS is able to parse this string:
>>> import shapely.wkb
>>> shapely.wkb.loads("0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40", hex=True)
<shapely.geometry.point.Point object at 0x7f221f2581d0>
当我要求Shapely从中解析然后转换为WKB时,我得到21个字节.
When I ask Shapely to parse from then convert to WKB I get a 21 bytes.
>>> shapely.wkb.loads("0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40", hex=True).wkb.encode("hex").upper()
'0101000000DB81DF2B5F7822C0DFBB7262B4744A40'
区别是中间的4个字节,对于typeif = d,在uint32中出现了3个字节
The difference is the 4 bytes in the middle, which appear 3 bytes into the uint32 for the typeif=d
01010000**20E61000**00DB81DF2B5F7822C0DFBB7262B4744A40
当无效的WKB时,为什么可以通过形状/地理解析此WKB?这些字节是什么意思?
Why can shapely/geos parse this WKB when it's invalid WKB? What do these bytes mean?
推荐答案
GEOS/精心使用WKT/WKB的扩展版本,称为EWKT/EWKB,其为 .如果您有权访问PostGIS,则可以在此处查看发生的情况:
GEOS / Shapely use an Extended variant of WKT/WKB called EWKT / EWKB, which is documented by PostGIS. If you have access to PostGIS, you can see what's going on here:
SELECT ST_AsEWKT('0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40'::geometry);
返回EWKT SRID=4326;POINT(-9.2351011 52.9117549)
.因此,额外的数据是空间参考标识符或SRID.专门针对WGS 84 EPSG:4326 .
Returns the EWKT SRID=4326;POINT(-9.2351011 52.9117549)
. So the extra data was the spatial reference identifier, or SRID. Specifically EPSG:4326 for WGS 84.
Shapely 不支持SRID ,但是有一些小技巧,例如:
Shapely does not support SRIDs, however there are a few hacks, e.g.:
from shapely import geos
geos.WKBWriter.defaults['include_srid'] = True
现在应该使wkb
或wkb_hex
输出EWKB,其中包括SRID.默认值为False
,它将为2D几何体(而不是3D)输出ISO WKB.
should now make wkb
or wkb_hex
output the EWKB, which includes the SRID. The default is False
, which would output ISO WKB for 2D geometries (but not for 3D).
因此,看来您的目标是将EWKB转换为ISO WKB,您只能将GEOS/Shapely用于2D几何.如果您具有3D(Z或M)或4D(ZM)几何,则只有PostGIS能够执行此转换.
So it seems your objective is to convert EWKB to ISO WKB, which you can do with GEOS / Shapely for 2D geometries only. If you have 3D (Z or M) or 4D (ZM) geometries, then only PostGIS is able to do this conversion.
这篇关于为什么可以通过形状/地理解析此“无效"的众所周知的二进制文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!