使用下面的代码,并尝试在href的末尾查找值。是否有方法提取href,并在BeutifulSoup/Regex中找到page=之后的值?

from bs4 import BeautifulSoup
import requests
import json
import re

request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')

findNext = soup.find("a", class_="next_page")
print(findNext)

获取此输出:
<a class="next_page" href="/quotes/tag/fun?page=2" rel="next">next »</a>

注意:要从上面或可能出现的任何其他数字中提取2

最佳答案

您可以使用regex查找页码:

from bs4 import BeautifulSoup
import re
request = requests.get('https://www.goodreads.com/quotes/tag/fun?page=1')
soup = BeautifulSoup(request.text, 'html.parser')
page_nums = re.findall('(?<=page\=)\d+', str(soup.find("a", class_="next_page")))[0]

输出:
2

10-05 20:56
查看更多