本文介绍了在AWS Lambda上运行Selenium的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我目前正在尝试实施一个刮板,该刮板将每天检查两次,以了解某些PDF是否更改了名称.不幸的是,它需要网站操纵才能找到pdf,因此我认为最好的解决方案是Selenium和AWS Lambda的组合.

I am currently trying to implement a scraper that will check twice a day for if certain PDFs change names. Unfortunately it requires website manipulation to find the pdfs so the best solution in my mind is a combination of Selenium and AWS Lambda.

首先,我遵循教程.我已经完成了本教程,但在Lambda中遇到了此错误:

To begin I was following this tutorial. I have completed the tutorial but ran into this error from Lambda:

START RequestId: 18637c6d-ea75-40ee-8789-374654700b99 Version: $LATEST
Starting google.com
Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
: WebDriverException
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 46, in lambda_handler
    driver = webdriver.Chrome(chrome_options=chrome_options)
  File "/var/task/selenium/webdriver/chrome/webdriver.py", line 68, in __init__
    self.service.start()
  File "/var/task/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home

此错误是其他人遇到的,并且作者链接到堆栈溢出页面.我已经尝试过了,但是所有答案都与在台式机上使用无头铬而不是在AWS Lambda上有关.

This error was experienced by others and was "resolved" by the author by linking to this stack overflow page. I have tried going through it but all the answers are pertaining to using headless chromium on desktop not AWS lambda.

我尝试了几次更改都没有用.

A couple of changes Ive tried to no avail.

1)将chromedriver和headless-chromium更改为.exe文件
2)更改此代码行以包含可执行文件路径

1) Changing the chromedriver and headless-chromium to .exe files
2) Changing this line of code to include the executable_path

driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=os.getcwd() + "/bin/chromedriver.exe")

在帮助硒和aws lambda协同工作方面的任何帮助将不胜感激.

Any help in getting selenium and aws lambda working together would be greatly appreciated.

推荐答案

我遇到了同样的问题,这是由于二进制文件位于无法执行它们的位置.添加了移动它们的功能,然后从该位置读取它们,从而对其进行了修复.请参阅下面的示例,该示例是我在研究此错误时刚开始工作的.(为乱码表示歉意.)

I had the same issue and it was due to the binary files being in a location that couldn't execute them. Adding a function to move them, then reading them from that location fixed it. See below example which I just got working while researching this error. (Apologies for the messy code.)

import time
import os
from selenium import webdriver
from fake_useragent import UserAgent

import subprocess
import shutil
import time

BIN_DIR = "/tmp/bin"
CURR_BIN_DIR = os.getcwd() + "/bin"

def _init_bin(executable_name):
    start = time.clock()
    if not os.path.exists(BIN_DIR):
        print("Creating bin folder")
        os.makedirs(BIN_DIR)
    print("Copying binaries for " + executable_name + " in /tmp/bin")
    currfile = os.path.join(CURR_BIN_DIR, executable_name)
    newfile = os.path.join(BIN_DIR, executable_name)
    shutil.copy2(currfile, newfile)
    print("Giving new binaries permissions for lambda")
    os.chmod(newfile, 0o775)
    elapsed = time.clock() - start
    print(executable_name + " ready in " + str(elapsed) + "s.")

def handler(event, context):

    _init_bin("headless-chromium")
    _init_bin("chromedriver")

    chrome_options = webdriver.ChromeOptions()

    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--window-size=1280x1696')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--enable-logging')
    chrome_options.add_argument('--log-level=0')
    chrome_options.add_argument('--v=99')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--ignore-certificate-errors')

    chrome_options.binary_location = "/tmp/bin/headless-chromium"
    driver = webdriver.Chrome("/tmp/bin/chromedriver", chrome_options=chrome_options)
    driver.get('https://en.wikipedia.org/wiki/Special:Random')
    line = driver.find_element_by_class_name('firstHeading').text
    print(line)
    driver.quit()

    return line

这篇关于在AWS Lambda上运行Selenium的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 16:16