使用这两个Python命令,我可以轻松获得Public IP。

>>> get('https://ident.me').text
'1.2.3.4'
>>>

>>> urllib.request.urlopen('https://ident.me').read().decode('utf8')
'1.2.3.4'
>>>


但是,当我将URL从https://ident.me更改为http://ip.zscaler.com/时,我收到了太多不必要的HTML信息。

我只对以下基于文本的信息感兴趣,如下面的屏幕快照所示。

测试代理1
python - 从http://ip.zscaler.com/获取公共(public)IP和其他基于文本的信息-LMLPHP

测试代理2
python - 从http://ip.zscaler.com/获取公共(public)IP和其他基于文本的信息-LMLPHP

测试代理3
python - 从http://ip.zscaler.com/获取公共(public)IP和其他基于文本的信息-LMLPHP

是否可以从http://ip.zscaler.com/中仅获取基于重要文本的信息,并删除其他不必要的HTML标签?

期望的输出

>>> get('http://ip.zscaler.com/').text
The request received from you did not have an XFF header, so you are quite likely not going through the Zscaler proxy service.
Your request is arriving at this server from the IP address x.x.x.x
Your Gateway IP Address is most likely x.x.x.x
>>>

>>> urllib.request.urlopen('http://ip.zscaler.com/').read().decode('utf8')
The request received from you did not have an XFF header, so you are quite likely not going through the Zscaler proxy service.
Your request is arriving at this server from the IP address x.x.x.x
Your Gateway IP Address is most likely x.x.x.x
>>>

最佳答案

使用BeautifulSouprequests

from bs4 import BeautifulSoup
from requests import get

URL = "http://ip.zscaler.com/"

# GET request to url
request = get(URL).text

# Create parser
soup = BeautifulSoup(request, features="html.parser")

# Print out headline
headline = soup.find("div", attrs={"class": "headline"})
print(headline.text)

# Print out details
for detail in soup.find_all("div", attrs={"class": "details"}):
    print(detail.text)


给出以下输出:

The request received from you did not have an XFF header, so you are quite likely not going through the Zscaler proxy service.
Your request is arriving at this server from the IP address 119.17.136.170
Your Gateway IP Address is most likely 119.17.136.170

10-07 19:15
查看更多