问题描述
我有一个非常具体的数据导入问题,而且我对XML数据集还很陌生,所以我的问题很可能是由于我缺乏理解.我想阅读Deutsche Bahn的德语轨道网络,该网络已在此处公开发布: http://data.deutschebahn.com/dataset/data-streckennetz (不幸的是,链接在德国)
I have a very specific data import problem and I am fairly new to XML data sets, so my problems are probably due to my lack of understanding.I would like to read in the German track network from Deutsche Bahn, which is published publically here: http://data.deutschebahn.com/dataset/data-streckennetz (link is in Germany unfortunately)
这将是直接链接: http://download-data.deutschebahn.com/static/datasets/streckennetz/INSPIRE_0618.zip
还有一个指向INSPIRE数据集的200页文档的链接,但它并不能真正帮助我理解解析XML文档. https://inspire.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_TN_v3.0.pdf
There is also a link to a 200 page document about the INSPIRE data set, but it does not really help me understand parsing the XML document.https://inspire.ec.europa.eu/documents/Data_Specifications/INSPIRE_DataSpecification_TN_v3.0.pdf
我将文件下载到WD并尝试使用XML包读取它
I downloaded the file to my WD and tried to read it using the XML package
require(XML)
data <- xmlParse(file.path(Folder,data.file.import), useInternalNodes = FALSE)
root<-xmlRoot(data)
root_child<-xmlChildren(root)
First_child<-root_child[[1]]
xmlName(First_child)
xmlSize(First_child)
xmlAttrs(First_child)
看看第一个孩子会发现我猜想的网络边界
Looking at first childe shows the borders of the network I guess
<wfs:boundedBy>
< gml:Envelope srsName="urn gc:def:crs:EPSG::4258" srsDimension="2">
< gml:lowerCorner>47.397789564359 6.021325139431</gml:lowerCorner>
< gml:upperCorner>54.907638367755 15.031955280103</gml:upperCorner>
< /gml:Envelope>
< /wfs:boundedBy>
其他孩子对我没有太大帮助.第二个是名称列表,第三个是复杂列表.
The other children do not help me much. The second is a list of names and the third a complex list.
Second_child<-root_child[[2]]
Second_child<-root_child[[3]]
有人可以以某种方式帮助我还是指向我可以学习如何解析此链接的链接?
Can anybody help me somehow or point to a link where I could learn how to parse this?
推荐答案
这是GML文件,因此嵌入到 rgdal
和 sf 代码>包.因此:
This is a GML file and so it can be read by the OGR drivers embedded into the rgdal
and sf
packages. Hence:
> sf::st_layers("./DB-Netz_INSPIRE_20171116.xml")
Driver: GML
Available layers:
layer_name geometry_type features fields
1 Network NA 1 12
2 ConditionOfFacility NA 7072 15
3 MarkerPost Point 34325 11
4 TrafficFlowDirection NA 7072 15
5 VerticalPosition NA 1313 15
[etc]
可以使用 sf :: st_read
:
> nodes = sf::st_read("./DB-Netz_INSPIRE_20171116.xml","RailwayNode")
Reading layer `RailwayNode' from data source `/home/rowlings/Downloads/SO/train/DB-Netz_INSPIRE_20171116.xml' using driver `GML'
Simple feature collection with 21457 features and 20 fields
geometry type: POINT
dimension: XY
bbox: xmin: 6.021325 ymin: 47.39779 xmax: 15.03196 ymax: 54.90462
epsg (SRID): 4258
proj4string: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
> plot(nodes$geom)
>
它产生了一组点,我可以很好地看到德国的轮廓.
which produces a set of points that I can see outline Germany quite nicely.
还可以使用 st_read
读取非空间数据,并在可能的情况下返回数据帧:
The non-spatial data can also be read with st_read
and returns a data frame when possible:
> ds = sf::st_read("./DB-Netz_INSPIRE_20171116.xml","DesignSpeed")
Reading layer `DesignSpeed' from data source `/home/rowlings/Downloads/SO/train/DB-Netz_INSPIRE_20171116.xml' using driver `GML'
Warning message:
no simple feature geometries present: returning a data.frame or tbl_df
>
我想这是铁路各个部分的速度限制-您必须查找元数据以查看ID在这样的表和地理数据之间如何匹配:
I guess this is the speed limits for various sections of rail - you'll have to lookup the metadata to see how the IDs match up between tables like this and geographic data:
> head(ds)
gml_id identifier applicableDirection fromPosition
1 Spd-2046676 urn:x-dbnetze:oid:Spd-2046676 <NA> 0
2 Spd-2046677 urn:x-dbnetze:oid:Spd-2046677 <NA> 0
3 Spd-2046678 urn:x-dbnetze:oid:Spd-2046678 <NA> 0
4 Spd-2046679 urn:x-dbnetze:oid:Spd-2046679 <NA> 0
5 Spd-2046680 urn:x-dbnetze:oid:Spd-2046680 <NA> 0
6 Spd-2046681 urn:x-dbnetze:oid:Spd-2046681 <NA> 0
[etc etc etc etc]
这篇关于XML导入INSPIRE GIS数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!