我正在用Selenium编写一些测试,并注意到, header 中缺少Referer。我编写了以下最小示例,使用https://httpbin.org/headers对此进行了测试:

import selenium.webdriver

options = selenium.webdriver.FirefoxOptions()
options.add_argument('--headless')

profile = selenium.webdriver.FirefoxProfile()
profile.set_preference('devtools.jsonview.enabled', False)

driver = selenium.webdriver.Firefox(firefox_options=options, firefox_profile=profile)
wait = selenium.webdriver.support.ui.WebDriverWait(driver, 10)

driver.get('http://www.python.org')
assert 'Python' in driver.title

url = 'https://httpbin.org/headers'
driver.execute_script('window.location.href = "{}";'.format(url))
wait.until(lambda driver: driver.current_url == url)
print(driver.page_source)

driver.close()

哪些打印:

<html><head><link rel="alternate stylesheet" type="text/css" href="resource://content-accessible/plaintext.css" title="Wrap Long Lines"></head><body><pre>{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.5",
    "Connection": "close",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}
</pre></body></html>

因此,没有Referer。但是,如果我浏览到任何页面并手动执行
window.location.href = "https://httpbin.org/headers"

在Firefox控制台中,Referer确实出现了预期的情况。

如下面的评论所指出的,当使用
driver.get("javascript: window.location.href = '{}'".format(url))

代替
driver.execute_script("window.location.href = '{}';".format(url))

该请求确实包含Referer。另外,当使用Chrome而不是Firefox时,两种方法都包括Referer

因此,主要问题仍然存在:如上所述,使用Firefox发送请求时,为什么在请求中缺少Referer呢?

最佳答案

根据MDN文档的Referer


资料来源:https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer

然而:



资料来源:https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer

隐私和安全问题
Referer HTTP header 存在一些隐私和安全风险:



资料来源:https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#The_referrer_problem

解决安全问题

Referer header 的角度来看,可以按照以下步骤缓解大多数安全风险:



资料来源:

  • https://developer.mozilla.org/en-US/docs/Web/Security/Referer_header:_privacy_and_security_concerns#How_can_we_fix_this
  • https://geekthis.net/post/hide-http-referer-headers/#exit-page-redirect


  • 这个用例

    我已经通过GeckoDriver/Firefox和ChromeDriver/Chrome组合执行了您的代码:

    代码块:
    driver.get('http://www.python.org')
    assert 'Python' in driver.title
    
    url = 'https://httpbin.org/headers'
    driver.execute_script('window.location.href = "{}";'.format(url))
    WebDriverWait(driver, 10).until(lambda driver: driver.current_url == url)
    print(driver.page_source)
    

    观察:
  • 使用GeckoDriver/Firefox的Referer: "https://www.python.org/" header 的缺少,如下所示:
        {
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "en-US,en;q=0.5",
            "Host": "httpbin.org",
            "Upgrade-Insecure-Requests": "1",
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0"
          }
        }
    
  • 使用ChromeDriver/Chrome Referer: "https://www.python.org/" header 是,现在是,如下所示:
        {
          "headers": {
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "en-US,en;q=0.9",
            "Host": "httpbin.org",
            "Referer": "https://www.python.org/",
            "Upgrade-Insecure-Requests": "1",
            "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36"
          }
        }
    

  • 结论:

    在处理Referer header 时,GeckoDriver/Firefox似乎是一个问题。

    奥托罗

    Referrer Policy

    关于python - Selenium请求的HTTP header 中缺少引荐来源,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54119674/

    10-11 02:39