本文介绍了将表抓取并写入数据帧显示我 TypeError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取表格并写入数据框中,他们向我显示了 typeerror.如何解决这些错误?

I am trying to scraping the table and write in a dataframe they show me a typeerror. How to resolve these errors?

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium import webdriver
import pandas as pd
temp=[]
driver= webdriver.Chrome('C:Program Files (x86)chromedriver.exe')
driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
headers=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//thead"))).text
rows=WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='sites']//tbody"))).text
temp.append(rows)
df = pd.DataFrame(temp,columns=headers)
print(df)

在标题中我传递数据 FAMI-QS Number ... Expiry date 而在行中我将传递 FAM-0694 ... 2022-09-04

In headers I pass the data FAMI-QS Number ... Expiry date while in rows I will pass the FAM-0694 ... 2022-09-04

推荐答案

刮取所有列中的所有数据,您需要引入 WebDriverWait 用于 visibility_of_element_located()

code> 元素,提取 outerHTML,使用read_html() 读取outerHTML,您可以使用以下定位器策略:

To scrape all the data from all the columns you need to induce WebDriverWait for the visibility_of_element_located() of the <table> element, extract the outerHTML, read the outerHTML using read_html() and you can use the following Locator Strategies:

  • 代码块:

  • Code Block:

driver.get("https://www.fami-qs.org/certified-companies-6-0.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[title='Inline Frame Example']")))
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#sites"))).get_attribute("outerHTML")
df  = pd.read_html(data)
print(df)
driver.quit()

  • 控制台输出:

  • Console Output:

    [  FAMI-QS Number                             Site Name              City  ... Status Certified from Expiry date
    0       FAM-1293                    AmTech Ingredients        albert lea  ...  Valid     2020-10-08  2023-10-07
    1       FAM-0841                    3F FEED & FOOD S L       vizcolozano  ...  Valid     2020-04-17  2023-04-16
    2       FAM-1361                5N Plus Additives GmbH  eisenhüttenstadt  ...  Valid     2020-10-01  2023-09-30
    3    FAM-1301-01                   A & V Corp. Limited            xiamen  ...  Valid     2020-09-09  2023-09-08
    4       FAM-1146  A. + E. Fischer-Chemie GmbH & Co. KG         wiesbaden  ...  Valid     2020-06-05  2023-06-04
    5       FAM-1589          A.M FOOD CHEMICAL CO LIMITED             jinan  ...  Valid     2020-01-07  2023-01-06
    6    FAM-0613-01                          A.W.P. S.r.l        crevalcore  ...  Valid     2020-02-27  2023-02-07
    7       FAM-0867             AB AGRI POLSKA Sp. z o.o.           smigiel  ...  Valid     2020-08-03  2023-03-19
    8    FAM-1510-02                              AB Vista       marlborough  ...  Valid     2020-04-16  2023-04-15
    9    FAM-1510-01                            AB Vista *         rotterdam  ...  Valid     2020-04-16  2023-04-15
    
    [10 rows x 7 columns]]
    

  • 这篇关于将表抓取并写入数据帧显示我 TypeError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

    08-01 04:20