我正在尝试使用beutifulsoup网页抓取多个页面,并且已经成功检索了单个页面的数据。现在,我想知道如何实现一些循环以从多个页面中检索数据。
该网页的链接为:https://www.diac.ca/directory/wpbdp_category/dealers-distributors/
这是我的代码:
from bs4 import BeautifulSoup
import requests
import csv
source = requests.get('https://www.diac.ca/directory/wpbdp_category/dealers-distributors/').text
soup = BeautifulSoup(source, 'lxml')
csv_file = open('scrape.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['company', 'website'])
for i in soup.find_all('div', class_='wpbdp-listing'):
company = i.find('div', class_='listing-title').a.text
print(company)
website = i.find('div', class_='wpbdp-field-business_website_address').span.a.text
print(website)
csv_writer.writerow([company, website])
csv_file.close()
我非常感谢您的任何反馈或见解。非常感谢你!
最佳答案
一种可能是尝试在class=next
标记下找到链接。如果链接存在,请使用它加载下一页。如果链接不存在,请中断循环:
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.diac.ca/directory/wpbdp_category/dealers-distributors/').text
soup = BeautifulSoup(source, 'lxml')
page = 1
while True:
print('Page no. {}'.format(page))
print('-' * 80)
for i in soup.find_all('div', class_='wpbdp-listing'):
company = i.find('div', class_='listing-title').a.text
print(company)
website = i.find('div', class_='wpbdp-field-business_website_address').span.a.text
print(website)
if soup.select_one('.next a[href]'):
soup = BeautifulSoup(requests.get(soup.select_one('.next a[href]')['href']).text, 'lxml')
page += 1
else:
break
印刷品:
Page no. 1
--------------------------------------------------------------------------------
AMD Medicom Inc.
http://www.medicom.ca
Clinical Research Dental Supplies & Services Inc.
http://www.clinicalresearchdental.com
Coltene Whaledent
http://www.coltene.com
CompuDent Systems Inc.
http://www.compudent.ca
DenPlus Inc.
http://www.denplus.com
Dental Canada Instrumentation
http://www.mydentalcanada.com
Dental Services Group of Toronto Inc.
http://www.dsgtoronto.com
Dental Wings Inc.
http://www.dentalwings.com
Dentsply Sirona Canada
http://www.dentsplysirona.ca
DiaDent Group International Inc.
http://www.diadent.com
Page no. 2
--------------------------------------------------------------------------------
DMG America LLC
http://www.dmg-america.com
Hager Worldwide, Inc.
http://www.hagerworldwide.com
Hansamed Ltd
http://www.hansamed.net
Henry Schein Canada
http://www.henryschein.com
Heraeus Kulzer LLC
http://www.heraeus-kulzer-us.com
Johnson & Johnson Inc.
http://www.jjnjcanada.com
K-Dental Inc.
http://www.k-dental.ca
Kerr Dental
http://www.kerrdental.com
Northern Surgical & Medical Supplies Ltd.
www.northernsurgical.com
Northern Surgical and Medical Supplies Ltd.
http://www.northernsurgical.com
Page no. 3
--------------------------------------------------------------------------------
Patterson Dental/Dentaire Canada Inc.
http://www.pattersondental.ca
Procter & Gamble Oral Health
http://www.pg.com
Qwerty Dental Inc.
http://www.qwertydental.com
Sable Industries Inc.
http://www.sableindustriesinc.com
Septodont of Canada, Inc.
http://www.septodont.ca
Sure Dental Supplies of Canada Inc.
http://www.suredental.com
Swiss NF Metals Inc.
http://www.swissnf.com
The Aurum Group
http://www.aurumgroup.com
The Surgical Room Inc.
http://www.thesurgicalroom.ca
Unique Dental Supply Inc.
http://www.uniquedentalsupply.com