使用下面的代码,并尝试在href的末尾查找值。是否有方法提取href,并在BeutifulSoup/Regex中找到page=
之后的值?
from bs4 import BeautifulSoup
import requests
import json
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
findNext = soup.find("a", class_="next_page")
print(findNext)
获取此输出:
<a class="next_page" href="/quotes/tag/fun?page=2" rel="next">next »</a>
注意:要从上面或可能出现的任何其他数字中提取
2
。 最佳答案
您可以使用regex
查找页码:
from bs4 import BeautifulSoup
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
page_nums = re.findall('(?<=page\=)\d+', str(soup.find("a", class_="next_page")))[0]
输出:
2