但仅在某些情况下

但仅在某些情况下

本文介绍了Selenium无法连接到GhostDriver(但仅在某些情况下)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在带有Selenium和PhantomJS的Python中设置了一个简单的webscraping脚本.我总共要抓取大约200个URL.该脚本最初运行良好,然后运行了大约20-30个URL(它可能会更多/更少,因为它失败时似乎是随机的,并且与任何特定的URL不相关),我在python中收到以下错误:

I've setup a simple webscraping script in Python w/ Selenium and PhantomJS. I've got about 200 URLs in total to scrape. The script runs fine at first then after about 20-30 URLs (it can be more/less as it seems random when it fails and isn't related to any particular URL) I get the following error in python:

selenium.common.exceptions.WebDriverException: Message: 'Can not connect to GhostDriver'

还有我的ghostdriver.log:

And my ghostdriver.log:

PhantomJS is launching GhostDriver...
[ERROR - 2014-07-04T17:27:37.519Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140692115795456,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n    at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}

我已经搜索过,所以关于SO的大多数问题似乎都是他们甚至无法运行单个URL.我发现脚本中间错误发生的唯一另一个问题是,答案是将phantomjs升级到最新版本,这已经完成了.另一个答案只是说要再试一次该URL,但似乎不是一个好的解决方案,因为该URL可能会再次失败.

I've searched and most of the questions on SO seem to be that they can't even run a single URL. The only other question I've found where the error occurs at the middle of the script is this one and the answer is to upgrade phantomjs to the latest version, which I've done. The other answer simply says to try that URL again and doesn't seem a good solution since the URL could simply fail again.

我正在python 2.7.6的Linux Mint 17上运行phantomjs版本1.9.7和硒版本2.42.1

I am running phantomjs version 1.9.7 and selenium version 2.42.1 on Linux Mint 17 on python 2.7.6

for url in ['example.com/1/', 'example.com/2/', 'example.com/3/', .. , ..]:
    user_agent = 'Chrome'
    dcap = dict(DesiredCapabilities.PHANTOMJS)
    dcap['phantomjs.page.settings.userAgent'] = user_agent
    driver = webdriver.PhantomJS(executable_path='/usr/bin/phantomjs', desired_capabilities=dcap)
    driver.get(url)

推荐答案

我在修复该问题时遇到了相同的问题我从源代码安装了phantomjs .

I had the same problem to fix it I installed phantomjs from source.

For Linux (Debian):
sudo apt-get update
sudo apt-get install build-essential chrpath git-core libssl-dev libfontconfig1-dev libxft-dev
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh

For Mac os:
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh

对于其他系统,请检查以下链接 http://phantomjs.org/build.html

For other systems check the following linkhttp://phantomjs.org/build.html

Optional :
cd bin
chmod +x phantomjs
cp phantomjs /usr/bin/

我想通了,因为当我读取我的ghostdriver.log文件时,它说了.

I figured it out because when I read my ghostdriver.log file it said.

[ERROR - 2014-09-04T19:33:30.842Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140145669488128,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n    at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}

我确定必须有一些丢失的文件,某些情况下必须使用该文件.因此,我决定从源头开始构建,并且现在可以正常工作.

I was sure that there must be some missing files which, it must be using for some edge cases. So I decided to build from source and its working fine now.

这篇关于Selenium无法连接到GhostDriver(但仅在某些情况下)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 09:26