问题描述
我正在使用python并尝试抓取计算机与站点之间的HTTP通信,其中包括所有传入和传出的请求,响应(例如图像和外部呼叫等).
I am using python and attempting to scrape the HTTP(s) traffic between my computer and a site, which would include all incoming and outgoing requests,responses, such as images and external calls, etc.
我试图在我的hit_site
函数中查找网络流量,但是找不到该信息.
I have attempted to find the network traffic within my hit_site
function, but I'm not finding the information.
hit_site("http://www.google.com")
def hit_site(url):
print url
r = requests.get(url,stream = True)
print r.headers
print r.encoding
print r.status_code
print r.json()
print requests.get(url,stream=True)
print r.request.headers
print r.response.headers
for line in r.iter_lines():
print line
data = r.text
soup = BeautifulSoup(data)
return soup
以下是我要捕获的信息类型的示例(我使用fiddler2来获取此信息.所有这些以及更多的信息都来自于访问groupon.com):
An example of the type of information that I would like to capture is the following (I used fiddler2 to get this information. All of this and more came from visiting groupon.com):
# Result Protocol Host URL Body Caching Content-Type Process Comments Custom
6 200 HTTP www.groupon.com / 23,236 private, max-age=0, no-cache, no-store, must-revalidate text/html; charset=utf-8 chrome:6080
7 200 HTTP www.groupon.com /homepage-assets/styles-6fca4e9f48.css 6,766 public, max-age=31369910 text/css; charset=UTF-8 chrome:6080
8 200 HTTP Tunnel to img.grouponcdn.com:443 0 chrome:6080
9 200 HTTP img.grouponcdn.com /deal/gsPCLbbqioFVfvjT3qbBZo/The-Omni-Mount-Washington-Resort_01-960x582/v1/c550x332.jpg 94,555 public, max-age=315279127; Expires: Fri, 18 Oct 2024 22:20:20 GMT image/jpeg chrome:6080
10 200 HTTP img.grouponcdn.com /deal/d5YmjhxUBi2mgfCMoriV/pE-700x420/v1/c220x134.jpg 17,832 public, max-age=298601213; Expires: Mon, 08 Apr 2024 21:35:06 GMT image/jpeg chrome:6080
11 200 HTTP www.groupon.com /homepage-assets/main-fcfaf867e3.js 9,604 public, max-age=31369913 application/javascript chrome:6080
12 200 HTTP www.groupon.com /homepage-assets/locale.js?locale=en_US&country=US 1,507 public, max-age=994 application/javascript chrome:6080
13 200 HTTP www.groupon.com /tracky 3 application/octet-stream chrome:6080
14 200 HTTP www.groupon.com /cart/widget?consumerId=b577c9c2-4f07-11e4-8305-0025906127fe 17 private, max-age=0, no-cache, no-store, must-revalidate application/json; charset=utf-8 chrome:6080
15 200 HTTP www.googletagmanager.com /gtm.js?id=GTM-B76Z 39,061 private, max-age=911; Expires: Wed, 22 Oct 2014 20:48:14 GMT text/javascript; charset=UTF-8 chrome:6080
我非常感谢关于如何使用python捕获网络流量的任何想法.
推荐答案
是一个广泛的工具(用Python编写),用于解析TCP流量,该工具.用于从Python运行和解码捕获的另一个工具是.
is an extensive tool (written in Python) for parsing TCP traffic, which . Another tool for running and decoding captures from Python is .
请注意,要解码SSL流量包括数据,需要知道私钥.对于您无法控制的第三方服务器(例如Google)而言,这有些问题,并且需要付出很大的努力才能解决该问题.一种这样的方法是设置一个具有已知私钥的代理来播放中间人(并将自签名的CA安装到本地商店中以强制浏览器接受它).
Note that for decoding SSL traffic including data, private keys need to be known. This is somewhat problematic for a third-party server you don't control such as Google, and significant effort is required to work around it. One such approach is to set up a proxy with a known private key to play man-in-the-middle (and install a self-signed CA into your local store to force the browser to accept it).
这篇关于如何使用python捕获网络流量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!