对于家庭作业,我尝试将XML文件转换为R中的数据帧。我尝试了许多不同的方法,并且在Internet上搜索了一些想法,但均未成功。到目前为止,这是我的代码:
library(XML)
url <- 'http://www.ggobi.org/book/data/olive.xml'
doc <- xmlParse(myUrl)
root <- xmlRoot(doc)
dataFrame <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
data.frame(t(dataFrame),row.names=NULL)
我得到的输出就像一个巨大的数字 vector 。我试图将数据组织到一个数据框中,但是我不知道如何适当地调整我的代码来获得该数据。
最佳答案
它可能不像XML
包那样冗长,但是xml2
没有内存泄漏,并且专注于数据提取。我使用trimws
,这是R核心的真正新增功能。
library(xml2)
pg <- read_xml("http://www.ggobi.org/book/data/olive.xml")
# get all the <record>s
recs <- xml_find_all(pg, "//record")
# extract and clean all the columns
vals <- trimws(xml_text(recs))
# extract and clean (if needed) the area names
labs <- trimws(xml_attr(recs, "label"))
# mine the column names from the two variable descriptions
# this XPath construct lets us grab either the <categ…> or <real…> tags
# and then grabs the 'name' attribute of them
cols <- xml_attr(xml_find_all(pg, "//data/variables/*[self::categoricalvariable or
self::realvariable]"), "name")
# this converts each set of <record> columns to a data frame
# after first converting each row to numeric and assigning
# names to each column (making it easier to do the matrix to data frame conv)
dat <- do.call(rbind, lapply(strsplit(vals, "\ +"),
function(x) {
data.frame(rbind(setNames(as.numeric(x),cols)))
}))
# then assign the area name column to the data frame
dat$area_name <- labs
head(dat)
## region area palmitic palmitoleic stearic oleic linoleic linolenic
## 1 1 1 1075 75 226 7823 672 NA
## 2 1 1 1088 73 224 7709 781 31
## 3 1 1 911 54 246 8113 549 31
## 4 1 1 966 57 240 7952 619 50
## 5 1 1 1051 67 259 7771 672 50
## 6 1 1 911 49 268 7924 678 51
## arachidic eicosenoic area_name
## 1 60 29 North-Apulia
## 2 61 29 North-Apulia
## 3 63 29 North-Apulia
## 4 78 35 North-Apulia
## 5 80 46 North-Apulia
## 6 70 44 North-Apulia
更新
我现在想这样做最后一点:
library(tidyverse)
strsplit(vals, "[[:space:]]+") %>%
map_df(~as_data_frame(as.list(setNames(., cols)))) %>%
mutate(area_name=labs)
关于xml - R:将XML数据转换为数据帧,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/33446888/