python - Selenium在ipython和文件之间的行为不同

我正在尝试刮除纽约州directory of trial judges。该网站检查是否启用了javascript并在显示页面时显示有关需要python的简短警告。所以我一直在尝试硒。

但是，当我在ipython或python中逐行运行以下代码时，它可以正常访问页面。然后，如果我从命令行（python scraper.py）运行此命令，则该站点将显示javascript警告-但这只是我第一次访问该站点。有时候是这样的：

不管我使用什么浏览器，
我是否以无头浏览器的形式运行它，
无论我尝试设置什么cookie

我的代码：

import string
import csv
from selenium import webdriver

# Start the browser
browser = webdriver.Firefox()
browser.get(
    "https://iapps.courts.state.ny.us/judicialdirectory/JudicialDirectory")
print(browser.title)

# You can run the above 4 lines directly in ipython,
# but if running from the command line, the previous try will not work
browser.get(
    "https://iapps.courts.state.ny.us/judicialdirectory/JudicialDirectory")
print(browser.title)

如果是导入的话：我正在Windows 10上运行它。

有人对调试方法有任何建议吗？

最佳答案

此处的区别在于，当作为脚本运行时，browser.title在JS实际上有机会执行之前就已被访问。您可以通过在获取页面后稍等片刻来避免这种情况。使用time.sleep很简单

browser.get(...)
time.sleep(1.5)

但是，这可能会导致您等待的时间超过了所需的时间。因此，最好使用硒的expected condition support。这样，您将只需要等待的时间。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

condition = EC.presence_of_element_located((By.ID, "some_element_id_present_after_JS_load"))
driver.get(url)
WebDriverWait(driver, 10).until(condition)
print(driver.title)
# ...

关于python - Selenium在ipython和文件之间的行为不同，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/49823526/