在python3中,我想使用请求从页面中提取信息并美化组

import requests
from bs4 import BeautifulSoup

link = "https://portal.stf.jus.br/processos/listarPartes.asp?termo=AECIO%20NEVES%20DA%20CUNHA"

try:
    res = requests.get(link)
except (requests.exceptions.HTTPError, requests.exceptions.RequestException, requests.exceptions.ConnectionError, requests.exceptions.Timeout) as e:
    print(str(e))
except Exception as e:
    print("Exceção")

html = res.content.decode('utf-8')

soup =  BeautifulSoup(html, "lxml")

pag = soup.find('div', {'id': 'total'})

print(pag)

在这种情况下,信息位于如下HTML片段中:
<div id="total" style="display: inline-block"><input type="hidden" name="totalProc" id="totalProc" value="35">35</div>

我要访问的是值,在本例中是35。捕获编号“35”
这就是为什么我使用“pag=soup.find('div',{'id':'total'})”。慢慢地分离出35号
但返回的内容只是:<div id="total" style="display: inline-block"><img src="ajax-loader.gif"/></div>
请问有人知道如何只捕捉有价值的内容吗?

最佳答案

它是从另一个XHR调用中动态提取的,您可以在network选项卡中找到它

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://portal.stf.jus.br/processos/totalProcessosPartes.asp?termo=AECIO%20NEVES%20DA%20CUNHA&total=0')
soup = bs(r.content, 'lxml')
print(soup.select_one('#totalProc')['value'])

带正则表达式
import requests, re

r = requests.get('https://portal.stf.jus.br/processos/totalProcessosPartes.asp?termo=AECIO%20NEVES%20DA%20CUNHA&total=0')
soup = bs(r.content, 'lxml')
print(re.search('value=(\d+)',r.text).groups(0)[0])

关于python - 如何通过beautifulsoup获得隐藏输入的值?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58291530/

10-12 22:48