我正在尝试从该网站获取表格的搜索结果:
https://www.handelsregister.de/rp_web/result.do?Page=1
但它返回一个空表。我正在使用此代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup as BS
from requests import get
url = "https://www.handelsregister.de/rp_web/result.do?Page=1"
html = urlopen(url)
soup = BS(html, 'lxml')
table = soup2.find_all('table')
#table = soup.find_all('table', class_ = 'RegPortErg')
#table = soup.find('table', {'class': 'RegPortErg'})
print(table)
最佳答案
它不是一个很干净的表,但是可以使用requests.post()
:
from bs4 import BeautifulSoup as BS
import requests
import pandas as pd
url = "https://www.handelsregister.de/rp_web/mask.do?Typ=e"
payloads = {
'suchTyp': 'e',
'registerArt': 'HRA',
'registerNummer': '',
'bundeslandBW': 'on',
'registergericht': '',
'schlagwoerter': '',
'schlagwortOptionen': '2',
'niederlassung': '',
'rechtsform': '',
'postleitzahl': '',
'ort': '',
'strasse': '',
'ergebnisseProSeite': '10',
'btnSuche': 'Find'}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
html = requests.post(url, data=payloads, headers=headers)
tables = pd.read_html(html.text)
table = tables[1]
输出:
print (table)
0 ... 4
0 Firma / Name ... NaN
1 Baden-Württemberg Amtsgericht Freiburg HRA ... ... NaN
2 NaN ... AD CD HD DK UT VÖ SI
3 Baden-Württemberg Amtsgericht Ulm HRA 726084 ... NaN
4 NaN ... AD CD HD DK UT VÖ SI
5 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
6 NaN ... AD CD HD DK UT VÖ SI
7 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
8 NaN ... AD CD HD DK UT VÖ SI
9 NaN ... NaN
10 NaN ... NaN
11 NaN ... NaN
12 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
13 NaN ... AD CD HD DK UT VÖ SI
14 Baden-Württemberg Amtsgericht Freiburg HRA ... ... NaN
15 NaN ... AD CD HD DK UT VÖ SI
16 NaN ... NaN
17 NaN ... NaN
18 NaN ... NaN
19 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
20 NaN ... AD CD HD DK UT VÖ SI
21 NaN ... NaN
22 NaN ... NaN
23 Baden-Württemberg Amtsgericht Stuttgart HRA... ... NaN
24 NaN ... AD CD HD DK UT VÖ SI
25 NaN ... NaN
26 NaN ... NaN
27 Baden-Württemberg Amtsgericht Freiburg HRA ... ... NaN
28 NaN ... AD CD HD DK UT VÖ SI
29 NaN ... NaN
30 NaN ... NaN
31 Baden-Württemberg Amtsgericht Mannheim HRA ... ... NaN
32 NaN ... AD CD HD DK UT VÖ SI
[33 rows x 5 columns]
关于python - Web抓取问题,返回空表,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/54669242/