本文介绍了适用于GAE的Python Headless浏览器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将Angular.js客户端与Google Appengine上的webapp2一起使用.

I'm trying to use Angular.js client-side with webapp2 on Google Appengine.

为了解决SEO问题,该想法是使用无头浏览器来运行javascript服务器端并将生成的html提供给搜寻器.

In order to solve the SEO issues the idea was to use a headless browser to run the javascript server-side and serve the resulting html to the crawlers.

是否有在Google App Engine上运行的Python无头浏览器?

Is there any headless browser for python that runs on google app engine?

推荐答案

现在可以使用自定义运行时在App Engine Flex上完成此操作,因此我要添加此答案,因为此问题是在Google中弹出的第一件事.

This can now be done on App Engine Flex with a custom runtime, so I'm adding this answer since this question is the first thing to popup in google.

我将此自定义运行时基于其他使用预先构建的python运行时的GAE flex微服务

I based this custom runtime off of my other GAE flex microservice which uses the pre-built python runtime

项目结构:

webdrivers/
- geckodriver
app.yaml
Dockerfile
main.py
requirements.txt

app.yaml:

service: my-app-engine-service-name
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT main:app --timeout 180

Dockerfile:

Dockerfile:

FROM gcr.io/google-appengine/python
RUN apt-get update
RUN apt-get install -y xvfb
RUN apt-get install -y firefox
LABEL python_version=python
RUN virtualenv --no-download /env -p python
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/
RUN pip install -r requirements.txt
ADD . /app/
CMD exec gunicorn -b :$PORT main:app --timeout 180

requirements.txt:

requirements.txt:

Flask==0.12.2
gunicorn==19.7.1
selenium==3.13.0
pyvirtualdisplay==0.2.1

main.py

import os
import traceback

from flask import Flask, jsonify, Response
from selenium import webdriver
from pyvirtualdisplay import Display

app = Flask(__name__)

# Add the webdrivers to the path
os.environ['PATH'] += ':'+os.path.dirname(os.path.realpath(__file__))+"/webdrivers"

@app.route('/')
def hello():
    return 'Hello!!'

@app.route('/test/', methods=['GET'])
def go_headless():
    try:
        display = Display(visible=0, size=(1024, 768))
        display.start()
        d = webdriver.Firefox()
        d.get("http://www.python.org")
        page_source = d.page_source.encode("utf-8")
        d.close()
        display.stop()
        return jsonify({'success': True, "result": page_source[:500]})
    except Exception as e:
        print traceback.format_exc()
        return jsonify({'success': False, 'msg': str(e)})

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=8080, debug=True)

从此处(linux 64)下载geckodriver:

Download geckodriver from here (linux 64):

https://github.com/mozilla/geckodriver/releases

其他说明:

  • 请注意geckodriver,firefox和amp;的版本.您正在使用的硒,因为它可能是finnickey,给出此错误WebDriverException: Message: Can't load the profile. Possible firefox version mismatch. You must use GeckoDriver instead for Firefox 48+. Profile Dir: /tmp/tmp 48P If you specified a log_file in the FirefoxBinary constructor, check it for details.
  • 除非您使用的是旧版geckodriver/firefox,否则请不要设置DesiredCapabilities().FIREFOX["marionette"] = False https://github .com/SeleniumHQ/selenium/issues/5106
  • 需要
  • display = Display(visible=0, size=(1024, 768))来解决此错误:
  • Be mindful of the versions of geckodriver, firefox & selenium you are using as it can be finnickey, giving this error WebDriverException: Message: Can't load the profile. Possible firefox version mismatch. You must use GeckoDriver instead for Firefox 48+. Profile Dir: /tmp/tmp 48P If you specified a log_file in the FirefoxBinary constructor, check it for details.
  • Unless you are using legacy geckodriver/firefox, do not set DesiredCapabilities().FIREFOX["marionette"] = False https://github.com/SeleniumHQ/selenium/issues/5106
  • display = Display(visible=0, size=(1024, 768)) is needed to fix this error: How to fix Selenium WebDriverException: The browser appears to have exited before we could connect?

要在本地测试:

docker build . -t my-docker-image-tag
docker run -p 8080:8080 --name=my-docker-container-name my-docker-image-tag

要部署到应用引擎,请执行以下操作:

To deploy to app engine:

gcloud app deploy app.yaml --version dev --project my-app-engine-project-id

这篇关于适用于GAE的Python Headless浏览器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-30 06:20