问题描述
我正在尝试从用户向下滚动到底部(无限滚动)时动态生成内容的页面上的链接。我已经尝试用Phantomjs做不同的事情,但不能收集第一页以外的链接。假设加载内容的底部的元素具有类 .has-more-items
。它可用,直到最终内容加载滚动,然后变得不可用在DOM(显示:无)。以下是我尝试过的东西 -
- 将视口大小设置为
var page = require('网页')create();
- 使用
page.scrollPosition = {top:10000,left:0}
里面page.open
但没有效果像 -
- 还尝试将其放在
page.evaluate
函数,但是这个
- 尝试使用jQuery和JS代码
page.evaluate
和page.open
但无效 -
的document.ready
。类似的JS代码 -
window.scrollBy(0,10000)
pre>
,因为它也是
window.onload
我真的很惊讶,现在已经有2天了,无法找到办法。任何帮助或提示将不胜感激。
更新
var hitRockBottom = false; while(!hitRockBottom){
//滚动页面(不知道这是否是最好的方式)
page.scrollPosition = {top:page.scrollPosition + 1000,left: 0};
//检查我们是否底部
hitRockBottom = page.evaluate(function(){
return document.querySelector(。have-more-items) === null;
});
其中
.has-more-items
是我想要访问的元素类,最初在页面底部可用,当我们向下滚动时,它进一步向下移动,直到所有数据被加载,然后变为不可用。
但是,当我测试时,很明显它正在进行无限循环,而不会向下滚动(我将渲染图片进行检查)。我试图用下面的代码(一次一个)替换
page.scrollPosition = {top:page.scrollPosition + 1000,left:0};
p>
window.document.body.scrollTop ='1000';
location.href =.has-more-items;
page.scrollPosition = {top:page.scrollPosition + 1000,left:0};
document.location.href =。have-more-items;
但似乎没有任何效果。
setInterval 或setTimeout
()。page.open('http://example.com/?q=houston',function(){
//检查底部div并随时滚动
window.setInterval(function(){
//检查是否有一个div与class =。has-more -items
//(不知道这是否是最好的方法)
var count = page.content.match(/ class =。has-more-items/ g)
if(count === null){//没有找到
page.evaluate(function(){
//滚动到页面底部
window.document.body.scrollTop = document.body.scrollHeight;
});
}
else {//找到
//做你想要的
...
phantom.exit();
}
},500); //滚动之间等待的毫秒数
});
I am trying to scrape links from a page that generates content dynamically as the user scroll down to the bottom (infinite scrolling). I have tried doing different things with Phantomjs but not able to gather links beyond first page. Let say the element at the bottom which loads content has class
.has-more-items
. It is available until final content is loaded while scrolling and then becomes unavailable in DOM (display:none). Here are the things I have tried-
- Setting viewportSize to a large height right after
var page = require('webpage').create();
- Using
page.scrollPosition = { top: 10000, left: 0 }
insidepage.open
but have no effect like-
- Also tried putting it inside
page.evaluate
function but that gives
- Tried using jQuery and JS code inside
page.evaluate
andpage.open
but to no avail-as it is and also inside
document.ready
. Similarly for JS code-window.scrollBy(0,10000)
as it is and also inside
window.onload
I am really struck on it for 2 days now and not able to find a way. Any help or hint would be appreciated.
Update
I have found a helpful piece of code at https://groups.google.com/forum/?fromgroups=#!topic/phantomjs/8LrWRW8ZrA0
var hitRockBottom = false; while (!hitRockBottom) { // Scroll the page (not sure if this is the best way to do so...) page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 }; // Check if we've hit the bottom hitRockBottom = page.evaluate(function() { return document.querySelector(".has-more-items") === null; }); }
Where
.has-more-items
is the element class I want to access which is available at the bottom of the page initially and as we scroll down, it moves further down until all data is loaded and then becomes unavailable.However, when I tested it is clear that it is running into infinite loops without scrolling down (I render pictures to check). I have tried to replace
page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 };
with codes from below as well (one at a time)window.document.body.scrollTop = '1000'; location.href = ".has-more-items"; page.scrollPosition = { top: page.scrollPosition + 1000, left: 0 }; document.location.href=".has-more-items";
But nothing seems to work.
解决方案Found a way to do it and tried to adapt to your situation. I didn't test the best way of finding the bottom of the page because I had a different context, but check it out. The problem is that you have to wait a little for the page to load out and javascript works asynchronously so you have to use
setInterval
orsetTimeout
(see).page.open('http://example.com/?q=houston', function () { // Checks for bottom div and scrolls down from time to time window.setInterval(function() { // Checks if there is a div with class=".has-more-items" // (not sure if this is the best way of doing it) var count = page.content.match(/class=".has-more-items"/g); if(count === null) { // Didn't find page.evaluate(function() { // Scrolls to the bottom of page window.document.body.scrollTop = document.body.scrollHeight; }); } else { // Found // Do what you want ... phantom.exit(); } }, 500); // Number of milliseconds to wait between scrolls });
这篇关于如何使用Phantomjs向下滚动以加载动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
08-22 21:21