我有代码,允许我返回从一个特定的网站,给定一个关键字的所有搜索部分。
当使用搜索词“HL4RPV-50”时,我可以按预期返回所有值。
当我使用搜索词“FSJ4-50B”时,该行的aNoSuchElementException
:
---> 53 price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
直接的XPATH是:
//*[@id="search"]/div[3]/div[2]/div[2]/div[2]/div[6]/div[2]/div[1]/div[1]/div/div[4]/div/add-product-to-cart/div[1]
两个部分ID的直接XPATH不同。此外,每个部分ID根据给定结果的部分位置有一个稍微不同的XPATH。
在我的印象中,我可以引用相对的XPATH来解决这个问题。
我试图从中删除的站点是Tessco.com并且在下面的代码中指定了通用UN/PW。
标识XPATH ID:
为了生成一个通用的XPATH,我在印象中使用了
.
来选择当前节点,并使用//
来从文档中的当前节点中选择与所选内容匹配的节点,不管它们在哪里。然后我指定了它的类型,这里是
div
然后@class='price'
对于“HL4RPV-50”这给了我想要的,对于“FSJ4-50B”它没有。
我相信我有错误的XPATH,但不确定如何概括它。
有什么建议吗?
代码:
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://www.tessco.com/login")
userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
#Expand the search bar
searchIcon = wait10.until(EC.element_to_be_clickable((By.XPATH, "/html/body/header/div[2]/div/div/ul/li[2]/i")))
searchIcon.click()
searchBar = wait10.until(EC.element_to_be_clickable((By.XPATH, '/html/body/header/div[3]/input')))
searchBar.click()
#load in manufacture part number from a collection of components, via an Excel file
#Enter information into the search bar
searchBar.send_keys("FSJ4-50B" + '\n')
# wait for the products information to be loaded
products = wait30.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
# create a dictionary to store product and price
productInfo = {}
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product name
productName = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
# get price
price = product.find_element_by_xpath(".//div[@class='price']").text.split('\n')[1]
# add details to dictionary
productInfo[productName] = price
# print products information
print(productInfo)
#time.sleep(5)
driver.close()
最佳答案
这是工作代码
我禁用了这些图像,因为我的Internet连接速度很慢,而且网站需要时间来加载页面。
我使用css选择器代替xPath作为price,它可以完全工作>
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
chrome_path = r".\web_driver\chromedriver.exe"
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_path, chrome_options=chrome_options)
driver.maximize_window()
driver.get("https://www.tessco.com/login")
userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
#Expand the search bar
# searchIcon = wait10.until(EC.element_to_be_clickable((By.XPATH, "")))
# searchIcon.click()
searchBar = wait10.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#searchBar input")))
#Enter information into the search bar
searchBar.send_keys("FSJ4-50B")
driver.find_element_by_css_selector('a.inputButton').click()
time.sleep(5)
# wait for the products information to be loaded
products = driver.find_elements_by_xpath( "//div[@class='CoveoResult']")
# create a dictionary to store product and price
productInfo = {}
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product name
productName = product.find_element_by_xpath("//a[@class='productName CoveoResultLink hidden-xs']").text
# get price
price = product.find_element_by_css_selector("div.price").text.split('\n')[1]
# add details to dictionary
productInfo[productName] = price
# print products information
print(productInfo)
#time.sleep(5)
driver.close()
输出:
{"8' Jumper-FSJ4-50B NM/NM": '$147.55'}
编辑:
如何选择选择器
正如您在上面的截图中看到的,我将鼠标悬停在搜索栏上,发现它有一个ID,我们知道ID始终是网页上唯一的元素,因此我们还可以使用:
driver.find_element_by_id("searchBar")
但要到达输入字段,我更喜欢css_选择器,然后发送键。
要查找css选择器:
对于
a.inputButton
css选择器,请参见选择搜索按钮,您将在dom中看到以下html:<a class="CoveoSearchButton inputButton button"><span class="coveo-icon">Search</span><i class="fa fa-search" aria-hidden="true"></i></a>
我们知道
a.button
是锚标记,从上面的html,我们可以推断css_选择器之一可以是:a.inputButton
注意
但这在这里是唯一的,在这种情况下,有时同一个类名可以在同一页面上的不同元素中多次使用,因此必须使用较高级别的节点才能到达子CSS元素节点。例如,
<a>
也可以遍历为:搜索按钮的另一个css\u选择器
div.divCoveoSearchbox > a.inputButton
因为
a.inputButton
是inputButton锚标记的父元素。我希望我明白你的意思了?