问题描述
以下示例类似于我的实际代码:
The following sample resembles my actual code:
function runCode() {
casper.then(function(){
if (condition){
return;
}
});
.... code .....
.... code .....
casper.then(function(){
setTimeout(runCode(), 1000);
});
}
function startScript() {
.... code ....
.... code ....
casper.then(function(){
runCode();
});
casper.then(function(){
setTimeout(startScript(),5000);
});
}
startScript();
此代码在vps上运行,似乎填满了所有512 MB RAM。它最初以大约50 MB的RAM开始,然后在数小时内将其填满。因此,我怀疑实现无限循环的方式是在不破坏旧堆栈框架的情况下创建新堆栈框架。
This code is running on a vps and it seems to fill up all the 512 MB of RAM. It initially starts with around 50 MB RAM and in few hours goes on to fill it up. So I suspect the way I'm implementing the infinite loop is creating new stack frames without destroying the old ones.
我要如何实现这一目标:
How I want to implement this : The execution starts with startScript()
and from inside the startScript()
it calls another function runCode()
. This runCode
function has to run infinitely in a loop. I'm trying to do it using the setTimeout
function.
在达到整个脚本的条件之前再次开始,所以我使用return并返回到 startScript()
函数,然后使用另一个 setTimeout()$ c $重新启动它c>功能。
There is a condition upon reaching which the whole script to start again so I'm using return and go back to startScript()
function and then restart it with another setTimeout()
function.
在过去的几个小时里,我所讨论的特定条件在我的脚本中并未遇到。因此,我怀疑内存使用情况在 runCode()
函数内。请给我一些建议,以消除此内存使用问题。
The specific condition I'm talking about has not been encountered in my script in the last few hours. So, I suspect the memory usage is within the runCode()
function. Please give me some suggestions to remove this memory usage problem.
更新:
我正在发送函数的返回值( null或未定义)作为 setTimeout()
的参数,为此,该函数必须运行一次,这会导致堆栈溢出。根据的建议,我尝试了以下代码,但是未调用作为参数传递给setTimeout的函数。
Update:I was sending the function's return value (which was null or undefined) as argument to the setTimeout()
and for this the function had to run once and this was causing the stackoverflow. As suggested by Artjom B., I tried the following code but the function passed as argument to the setTimeout is not being invoked.
function runCode() {
console.log("inside runcode");
casper.then(function(){
...
...
// call to other functions
});
//setTimeout(runCode, 1000); --------------- [i]
casper.then(function(){
console.log("just before setTimeout");
setTimeout(runCode, 1000);
});
}
runCode();
我得到以下输出:
内部运行代码
console.log来自其他函数的消息以及介于两者之间的代码。 setTimeout之前的
然后退出。
inside runcodeconsole.log messages from the other functions and codes in between.just before setTimeout
Then it exits.
如果我使用 [i] 指示的注释掉的代码,然后注释掉下面的行。我得到这样一个无限循环:
内部运行代码
内部运行代码
内部运行代码
....
....
我不知道怎么了。请给我建议。
If I use the commented out code as indicated by [i] and comment out the lines after that. I get an infinite loop like this:inside runcodeinside runcodeinside runcode........
I don't know what is wrong. Please suggest me something.
更新2:谢谢找出了。
setTimeout()
函数似乎有问题。当我在此粘贴中运行代码时:似乎没有
Update 2: Thank you Artjom B. for picking up another flaw in my code.There seems to be a problem with the setTimeout()
function. When I run the code in this paste: http://pastebin.com/W9DD6YpB, it doesn't seem to run infinitely as supposed.
更新3:如,javascript的异步特性使casper认为没有更多的代码可执行,因此在调用setTimeout排队的函数之前退出。
我想知道之后添加一些代码是否会使Casper不退出。例如,由 setTimeout()
排队的函数等待被调用的1000ms。因此, casper.wait(2000)
应该可以完成工作,但是我不知道是否还会出现堆栈溢出问题:
Update 3: As explained by Artjom B., the asynchronous nature of javascript is causing casper to think there is no more code left to execute so it is exiting before the function queued by setTimeout gets invoked.I'm wondering if adding some code after will make casper not exit. For example, function queued by setTimeout()
waits for 1000ms to be invoked. So, a casper.wait(2000)
should do the work but I don't know if there will still be stack overflow problems: http://pastebin.com/ybKWH5KX
推荐答案
在评论中进行一些讨论之后,很明显,使用 setTimeout
的方法不起作用,或者很难阅读和维护。
After some discussion in the comments, it was made clear that an approach with setTimeout
doesn't work or is rather hard to read and maintain.
您对递归调用 runCode
和 setTimeout 一起使用,因此code> startScript 是不接地的。因此,您应该使用CasperJS提供的功能。
Your concern for uncollected stack frames from recursive calling of runCode
and startScript
is ungrounded since CasperJS internally works with setTimeout
. So you should use the functions that are provided by CasperJS.
您可以递归地执行此操作(嵌套步骤),因为CasperJS使用队列处理得很好,并在当前执行后插入新步骤步骤。
You can do this recursively (nesting of steps), because CasperJS handles this well using a queue and inserting new steps after the current executed step.
您需要将停止条件移至递归调用,因为在这样的异步代码中,
You would need to move the stop condition to the recursive call, because in such an asynchronous code this
function runCode() {
casper.then(function(){
if (condition){
return;
}
});
//...
}
实际上并没有停止 runCode
执行,因为它只是从 then
块内部的函数返回。
doesn't actually stop runCode
execution, because it just returns from the function inside of the then
block.
然后您将 setTimeout
替换为:
function runCode() {
//...
casper.then(function(){
if (!condition){
setTimeout(runCode, 1000);
}
});
}
具有适当的Casper功能:
with the proper casper functions:
function runCode() {
//...
casper.wait(1000);
casper.then(function(){
if (!condition){
runCode();
}
});
}
您需要在 startScript $ c $中执行相同的替换操作c>从这里开始:
casper.then(function(){
setTimeout(startScript,5000);
});
到
casper.wait(5000);
casper.then(function(){
startScript();
});
保持 setTimeout
如果您确实想保留 setTimeout
,则需要进行双重记账。通过使用 setTimeout
调用一个函数,您可以摆脱Casper步骤的控制流。
On keeping setTimeout
If you really want to keep setTimeout
then you would need to do double bookkeeping. By calling a function with setTimeout
you break out of the controlled flow of casper steps.
例如,您可以执行以下操作:
For example, you may do something like this:
function someFunction(){
casper.then(function(){
// something
});
}
casper.start(url);
casper.then(function(){
setTimeout(someFunction, 5000);
});
casper.run();
then
中的函数实际上是最后计划的步骤。当它被执行时,它将创建一个计时器,然后启动一个功能,该功能反过来又会增加流程的步骤。这将永远不会发生,因为casper无法知道是否还会安排更多的步骤,而且由于目前还没有(在 then
之前,在<$ c之后$ c>运行),它只会退出完整的脚本。尽管在某些平台上,底层phantomjs的行为可能有所不同。 setTimeout
可让您突破控制流程。
The function inside then
is actually the last scheduled step. When it is executed it will create a timer to then start a function which in turn will add more steps to the flow. This will never happen, because casper has no way of knowing if there will be more steps scheduled and since there currently aren't (at the end of the then
before run
), it will simply exit the complete script. Although on some platforms the underlying phantomjs might behave differently. setTimeout
lets you break out of the control flow. This might not be good as in this case.
要重新获得控制权,您可以按照:
To gain control back you may do the following as indicated in your paste:
function someFunction(){
casper.then(function(){
// something
});
}
casper.start(url);
casper.then(function(){
setTimeout(someFunction, 5000);
});
casper.wait(5100); // should be greater than the previous timeout
casper.run();
^不要这样做。很难阅读且容易出错。可以简化为:
^ Do not do this. It is hard to read and error-prone. This can be simplified to:
casper.start(url);
casper.then(function(){
// something
});
casper.wait(5000, someFunction); // added bonus because "this" now refers to casper
casper.run();
对于 setTimeout
在 setTimeout
中,该函数的实际调用也存在语法问题。主要问题是您实际上没有使用 setTimeout
。例如,参见以下行
Proper callback invocation for setTimeout
You also have a syntactic problem with the actual invocation of the function in setTimeout
. The main problem is that you don't actually use setTimeout
. See for example the line
setTimeout(startScript(),5000);
在这里,您会立即调用 startScript
函数,因为()
并将返回值传递给 setTimeout
函数。我认为您实际上没有从 startScript
返回任何信息。 setTimeout
将采用 undefined
而不发出警告或错误,但由于超时而无法执行实际上不是功能。在javascript函数中是一等公民。您可以将函数对象传递给其他函数。
Here you invoke the startScript
function without delay, because of ()
and pass the return value into the setTimeout
function. I don't think you actually return anything from startScript
. setTimeout
will take the undefined
without issuing a warning or error, but can't execute it after the timeout, because it isn't actually a function. In javascript functions are first class citizens. You can pass the function object into other functions.
您可以通过从上一行删除()
来解决此问题:
You can fix this by removing ()
from the above line:
setTimeout(startScript,5000);
同样适用于
setTimeout(runCode, 1000);
(未经测试)删除先前的Casper步骤的解决方案
您确实应该从cron运行脚本而不进行递归或类似操作。如果您确实不希望这样做,则仍然可以减少内存消耗。
(untested) Solution for removing previous casper steps
You really should run the script from cron without the recursion or something like that. If you really don't want that, you still may be able to reduce the memory consumption.
通过然后* $ c $计划的步骤c>,
等待*
和其他一些内容在 casper.steps
属性。它们一旦执行就不会清除。因此,这可能是内存泄漏的原因。您可以尝试像这样清除它们:
The steps that are scheduled via then*
, wait*
and some other are managed in the internal casper.steps
property. They are not cleared once they are executed. So that may be the reason of your memory leak. You may try to clear them like this:
casper.clearSomeSteps = function(min, keep){
var len = casper.steps.length;
min = min || 1000; // only run when at least 1000 steps are scheduled
keep = keep || 100; // keep 100 of the newer steps
if (len < min) return; // not yet needed
this.step -= len-keep; // change the index of the current step
this.steps = Array.prototype.slice.call(this.steps, len-keep); // do the slice
};
在<$的开头调用 this.clearSomeSteps()
c $ c> startScript 。尽管这可能不是完整的解决方案,因为还有。
Call this.clearSomeSteps()
at the beginning of startScript
. Although this might not be the whole solution as there are also casper.waiters
.
这篇关于如何从此casperjs代码中消除堆栈溢出(使用setTimeout)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!