本文介绍了在 R 中转换 HTML 字符实体编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想转换 HTML 字符实体,例如&
到 &
或>
到 >
I would like to convert HTML character entities like&
to &
or>
to >
对于 Perl 存在可以做到这一点的包 HTML::Entities,但我在 R 中找不到类似的东西.
For Perl exists the package HTML::Entities which could do that, but I couldn't find something similar in R.
我也试过 iconv()
但没有得到满意的结果.也许还有一种使用 XML
包的方法,但我还没有想出来.
I also tried iconv()
but couldn't get satisfying results. Maybe there is also a way using the XML
package but I haven't figured it out yet.
推荐答案
更新:此答案已过时.请根据新的 xml2 pkg 检查下面的答案.
Update: this answer is outdated. Please check the answer below based on the new xml2 pkg.
尝试以下方法:
# load XML package
library(XML)
# Convenience function to convert html codes
html2txt <- function(str) {
xpathApply(htmlParse(str, asText=TRUE),
"//body//text()",
xmlValue)[[1]]
}
# html encoded string
( x <- paste("i", "s", "n", "&", "a", "p", "o", "s", ";", "t", sep = "") )
[1] "isn't"
# converted string
html2txt(x)
[1] "isn't"
更新:编辑了 html2txt() 函数,使其适用于更多情况
UPDATE: Edited the html2txt() function so it applies to more situations
这篇关于在 R 中转换 HTML 字符实体编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!