我无法正确解析此网站上的html:https://nwis.waterdata.usgs.gov/usa/nwis/gwlevels/?site_no=332857117043301

我想提取线“纬度34°02'48.57”,经度117°02'09.16”。虽然这显示在行862的页面源(Web开发人员工具)中,但当我通过解析时却不显示BeautifulSoup。使用lxml解析器也不会产生预期的结果。

import requests
import re
from bs4 import BeautifulSoup

page = requests.get('https://nwis.waterdata.usgs.gov/usa/nwis/gwlevels/?site_no=340248117020902')
soup = BeautifulSoup(page.content, 'html.parser')

print (soup.prettify())


我对页面内容的打印声明不显示纬度/经度线。如何调整代码以抓取此信息?

最佳答案

import requests
from bs4 import BeautifulSoup

html = requests.get('https://nwis.waterdata.usgs.gov/usa/nwis/gwlevels/?site_no=340248117020902')
soup = BeautifulSoup(html.text, 'lxml')

data = soup.find_all('div', attrs={'align': 'left'})

latitude = ''.join(x.contents[0].split(',')[0] for x in data if 'Latitude' in x.contents[0])
longitude = ''.join(x.contents[0].split(',')[1].strip().replace('\n', '') for x in data if 'Longitude' in x.contents[0])

print(latitude)
print(longitude)


输出:

Latitude  34°02'48.57"
Longitude 117°02'09.16" NAD83

09-20 18:00