使用Selenium和python进行Web抓取-包含文本的xpath

本文介绍了使用Selenium和python进行Web抓取-包含文本的xpath的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我会尽力使其简短.我正在尝试单击来自网站搜索的产品.基本上有一个匹配的产品列表，我想单击第一个包含在标题中搜索到的产品名称的产品.我将发布该网站的链接，以便您检查其DOM结构:https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_and 在这种情况下，许多包含我的查询字符串，而我只想单击第一个.

I will try to make it really short. I am trying to click on a product that came out of a search from a website. Basically there is a list of matching products, and I want to click on the first one which contains the product name I searched in its title.I will post the link of the website so you can inspect its DOM structure: https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_andIn this case, many contain my query string, and I would simply like to click on the first one.

这是我为此编写的代码段:

Here is the snippet of code I wrote for this:

def click_on_first_matching_product(self):
        first_product = WebDriverWait(self.driver, 6).until(
            EC.visibility_of_all_elements_located((By.XPATH, f"//a[@class='df-card__main']/div/div[@class=df-card__title] and contains(text(), '{self.product_code}')"))
        )[0]
        first_product.click()

问题是6秒钟过去了，无法找到满足我编写的xPath条件的元素，但是我无法弄清楚如何使其工作.我试图获取搜索结果一个元素，并检查其结构下方的标题是否包含我搜索的查询字符串.请给我一些帮助和解释吗?我对Selenium和XPath很陌生...

The problem is that 6 seconds go by and it cant find an element that satisfies the xPath condition i wrote, but I cant figure out how to make it work.I am trying to get a search result a element and check if the title it has down its structure contains the query string I searched.Can I have some help and an explanation please? I am quite new to selenium and XPaths...

我还可以链接到可靠的硒文档吗?我在寻找一个好人的过程中遇到了一些困难.也许还可以解释一下如何为xPath创建条件.

Can I please also have a link to a reliable selenium documentation? I am having some hard times trying to find a good one. Maybe one that also explains how to make conditions for xPaths please.

推荐答案

您需要考虑几件事.您的用例是单击第一个搜索结果或单击有关卡标题的项目.如果单击确定的 WebElement为 visibility_of_all_elements_located()引入 WebDriverWait 会太贵.

You need to consider a couple of things. Your use-case would be either to click on the first search result or to click on the item with respect to the card title. In case of clicking on a definite WebElement inducing WebDriverWait for visibility_of_all_elements_located() will be too expensive.

要单击与卡标题相关的项目，您必须诱使WebDriverWait 用于 element_to_be_clickable()，您可以使用以下 xpath 基于定位器策略:

To click on the item with respect to the card title you have to induce WebDriverWait for the element_to_be_clickable() and you can use the following xpath based Locator Strategies:

直接使用文本 CE285A兼容每Hp LaserJet P1102的碳粉:

driver.get('https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_and')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[text()='CE285A Toner Compatibile Per Hp LaserJet P1102']"))).click()

通过 format()使用文本变量:

driver.get('https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_and')
text = "CE285A Toner Compatibile Per Hp LaserJet P1102"
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[text()='{}']".format(text)))).click()

通过％s 使用文本变量:

driver.get('https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_and')
text = "CE285A Toner Compatibile Per Hp LaserJet P1102"
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[text()='%s']"% str(text)))).click()

要点击第一个搜索产品，您必须诱使 WebDriverWait 为 element_to_be_clickable()，您可以使用以下:

To click on first search product you have to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:

CSS_SELECTOR :

driver.get('https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_and')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.df-card>a"))).click()

XPATH :

driver.get('https://www.tonercartuccestampanti.it/#/dfclassic/query=CE285A&query_name=match_and')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='df-card']/a"))).click()

注意:您必须添加以下导入:

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

这篇关于使用Selenium和python进行Web抓取-包含文本的xpath的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！