问题描述
我正在尝试运行一些简单的程序以从html代码中提取表格.但是,XML包中的readHTMLTable似乎存在一些内存问题.有什么办法可以让我轻松解决此问题.就像以某种方式为该命令指定一些特殊的内存,然后手动释放它.
I am trying to run some simple program to extract tables from html code. However, there seems to be some memory issue with readHTMLTable in XML package. Is there any way I could just work around this easily. Like somehow specifying some special memory for this command and then freeing it manually.
我试图将其放入函数中,并尝试使用gc()和R的不同版本以及此软件包,但似乎没有任何效果.我开始感到绝望.
I have tried to put this in a function and tried to use gc() and different versions of R and this package and nothing seems to work. I start to get desperate.
示例代码.如何在不扩大内存大小的情况下运行它?
Example code. How to run this without exploding memory size?
library(XML)
a = readLines("http://en.wikipedia.org/wiki/2014_FIFA_World_Cup")
while(TRUE) {
b = readHTMLTable(a)
#do something with b
}
像这样的事情仍然占用了我所有的记忆:
Something like this still takes all of my memory:
library(XML)
a = readLines("http://en.wikipedia.org/wiki/2014_FIFA_World_Cup")
f <- function(x) {
b = readHTMLTable(x)
rm(x)
gc()
return(b)
}
for(i in 1:100) {
d = f(a)
rm(d)
gc()
}
rm(list=ls())
gc()
我正在使用Win 7,并尝试使用32位和64位.
I am using win 7 and tried with 32bit and 64bit.
推荐答案
从Win7上的XML 3.98-1.4和R 3.1开始,可以使用函数free()
完美地解决此问题.但是它不适用于readHTMLTable()
.以下代码可以完美运行.
As of XML 3.98-1.4 and R 3.1 on Win7, this problem can be solved perfectly by using the function free()
. But it does not work with readHTMLTable()
. The following code works perfectly.
library(XML)
a = readLines("http://en.wikipedia.org/wiki/2014_FIFA_World_Cup")
while(TRUE){
b = xmlParse(paste(a, collapse = ""))
#do something with b
free(b)
}
xml2软件包有类似的问题,可以通过使用函数remove_xml()
和gc()
释放内存.
The xml2 package has similar issues and the memory can be released by using the function remove_xml()
followed by gc()
.
这篇关于使用XML包进行R内存泄漏的解决方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!