问题描述
我正在用R语言开始一个项目,我必须解析XML,正在使用XML库和xmlToDataFrame,XMLPARSE等函数.我想以结构化的方式将信息存储在数据框上,但是我遇到了一个问题.我无法在每个节点的相应列中分别获取变量.通过使用上述功能,它将数据帧中变量的所有数据保存在一行中的单个单元格中.
I'm starting a project in R language and I have to parse an XML, I'm using the XML library and functions xmlToDataFrame, XMLPARSE, etc.. I want to store the information in a structured way on a dataframe but I've encountered a problem. I can not get variables to take within a node separately, each in its appropriate column. By using the above-mentioned functions, it saves all the data of the variables in the dataframe a single cell in a single line.
我使用的XML如下:
<?xml version="1.0" encoding="UTF-8"?>
-<rest-response>
<type>rest-response</type>
<time-stamp>1392217780000</time-stamp>
<status>OK</status>
<msg-version>1.0.0</msg-version>
<op>inventory</op>
-<response>
<inventorySize>3</inventorySize>
<inventoryMode>SYNCHRONOUS</inventoryMode>
<time>4952</time>
-<items>
-<item>
<epc>00000000000000000000A195</epc>
<ts>1392217779060</ts>
<location-id>adtr</location-id>
<location-pos>0,0,0</location-pos>
<device-id>adtr@1</device-id>
<device-reader>192.168.1.224</device-reader>
<device-readerPort>1</device-readerPort>
<device-readerMuxPort>0</device-readerMuxPort>
<device-readerMuxPort2>0</device-readerMuxPort2>
<tag-rssi>-49.0</tag-rssi>
<tag-readcount>36.0</tag-readcount>
<tag-phase>168.0</tag-phase>
</item>
-<item>
<epc>00000000000000000000A263</epc>
<ts>1392217779065</ts>
<location-id>adtr</location-id>
<location-pos>0,0,0</location-pos>
<device-id>adtr@1</device-id>
<device-reader>192.168.1.224</device-reader>
<device-readerPort>1</device-readerPort>
<device-readerMuxPort>0</device-readerMuxPort>
<device-readerMuxPort2>0</device-readerMuxPort2>
<tag-rssi>-49.0</tag-rssi>
<tag-readcount>36.0</tag-readcount>
<tag-phase>0.0</tag-phase>
</item>
-<item>
<epc>B00000000000001101080802</epc>
<ts>1392217779323</ts>
<location-id>adtr</location-id>
<location-pos>0,0,0</location-pos>
<device-id>adtr@1</device-id>
<device-reader>192.168.1.224</device-reader>
<device-readerPort>1</device-readerPort>
<device-readerMuxPort>0</device-readerMuxPort>
<device-readerMuxPort2>0</device-readerMuxPort2>
<tag-rssi>-72.0</tag-rssi>
<tag-readcount>27.0</tag-readcount>
<tag-phase>157.0</tag-phase>
</item>
</items>
</response>
</rest-response>
项目中的所有内容都将其作为单个值获取,我想通过不同的概念来加以说明.
Everything is inside item gets it as a single value, and I want to put asunder by different concepts.
另一个重要的一点是XML可能会改变,但是其结构将始终相同,但是可能会有更多的项目
Another important point is that the XML may change, but its structure will always be the same, but there may be more items
有什么主意吗?
推荐答案
所以我假设要在数据帧中使用<items>
.假设您的xml位于变量xml.text
中,这将起作用:
So I assume to want the <items>
in a data frame. Assuming your xml is in the variable xml.text
, this will work:
library(XML)
xml <- xmlInternalTreeParse(xml.text) # assumes your xml in variable xml.text
items <- getNodeSet(xml,"//items/item")
df <- xmlToDataFrame(items)
df
# epc ts location-id location-pos device-id device-reader device-readerPort device-readerMuxPort device-readerMuxPort2 tag-rssi tag-readcount tag-phase
# 1 00000000000000000000A195 1392217779060 adtr 0,0,0 adtr@1 192.168.1.224 1 0 0 -49.0 36.0 168.0
# 2 00000000000000000000A263 1392217779065 adtr 0,0,0 adtr@1 192.168.1.224 1 0 0 -49.0 36.0 0.0
# 3 B00000000000001101080802 1392217779323 adtr 0,0,0 adtr@1 192.168.1.224 1 0 0 -72.0 27.0 157.0
我还假定您在浏览器中显示了该xml并进行了剪切/粘贴(这将解释-<tag>
).否则,您的xml格式不正确.
I also assumed that you displayed this xml in a browser and cut/paste (which would explain the -<tag>
). Otherwise, your xml is not well-formed.
这篇关于用R解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!