我无法正确解析此网站上的html:https://nwis.waterdata.usgs.gov/usa/nwis/gwlevels/?site_no=332857117043301
我想提取线“纬度34°02'48.57”,经度117°02'09.16”。虽然这显示在行862的页面源(Web开发人员工具)中,但当我通过解析时却不显示BeautifulSoup。使用lxml解析器也不会产生预期的结果。
import requests
import re
from bs4 import BeautifulSoup
page = requests.get('https://nwis.waterdata.usgs.gov/usa/nwis/gwlevels/?site_no=340248117020902')
soup = BeautifulSoup(page.content, 'html.parser')
print (soup.prettify())
我对页面内容的打印声明不显示纬度/经度线。如何调整代码以抓取此信息?
最佳答案
import requests
from bs4 import BeautifulSoup
html = requests.get('https://nwis.waterdata.usgs.gov/usa/nwis/gwlevels/?site_no=340248117020902')
soup = BeautifulSoup(html.text, 'lxml')
data = soup.find_all('div', attrs={'align': 'left'})
latitude = ''.join(x.contents[0].split(',')[0] for x in data if 'Latitude' in x.contents[0])
longitude = ''.join(x.contents[0].split(',')[1].strip().replace('\n', '') for x in data if 'Longitude' in x.contents[0])
print(latitude)
print(longitude)
输出:
Latitude 34°02'48.57"
Longitude 117°02'09.16" NAD83