本文介绍了使用Puppeteer拦截请求时如何获取原始编码响应大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Chrome 中加载页面时,我使用此代码记录编码的响应大小:

const puppeteer = require("puppeteer");(异步函数(){const browser = await puppeteer.launch();const page = await browser.newPage();page._client.on("Network.loadingFinished", data => {console.log("finished", { encodingDataLength: data.encodedDataLength });});//等待 page.setRequestInterception(true);//page.on("request", async request => {//request.continue();//});等待 page.goto("http://example.com");等待 browser.close();})();

这是输出:

完成{编码数据长度:967}

但是,如果我取消注释代码片段中的四行,输出将更改为:

完成{编码数据长度:0}

这确实是有道理的,因为拦截的请求可能已经被客户端以某种方式修改过,并且之后不会再次被 gzip.

但是,有没有办法访问原始的 gzip 响应大小?

Chrome 跟踪也不包括压缩后的大小:

"encodedDataLength": 0,"decodedBodyLength": 1270,

解决方案

对于这种情况,我们可以使用 Content-Length 标头值.

谷歌的好人决定他们不会修复

Content-Length 标题在任何意义上都更可靠.

I'm using this code to log the encoded response size when loading a page in Chrome:

const puppeteer = require("puppeteer");

(async function() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  page._client.on("Network.loadingFinished", data => {
    console.log("finished", { encodedDataLength: data.encodedDataLength });
  });

  // await page.setRequestInterception(true);
  // page.on("request", async request => {
  //   request.continue();
  // });

  await page.goto("http://example.com");
  await browser.close();
})();

This is the output:

However, if I uncomment the four lines in the code snippet the output changes to:

This does make some sense, since the intercepted request could have been modified in some way by the client, and it would not have been gzipped again afterwards.

However, is there a way to access the original gzipped response size?


The Chrome trace also doesn't include the gzipped size:

解决方案

We can use Content-Length header value for such case.

The good guys at google decided they won't fix some weird bugs closely related to encodedDataLength.

Check the code and result below to see proof.

page.on("request", async request => {
  request.continue();
});

// Monitor using _client
page._client.on("Network.responseReceived", ({ response }) => {
  console.log("responseReceived", [
    response.headers["Content-Length"],
    response.encodedDataLength
  ]);
});

page._client.on("Network.loadingFinished", data => {
  console.log("loadingFinished", [data.encodedDataLength]);
});

// Monitor using CDP
const devToolsResponses = new Map();
const devTools = await page.target().createCDPSession();
await devTools.send("Network.enable");

devTools.on("Network.responseReceived", event => {
  devToolsResponses.set(event.requestId, event.response);
});

devTools.on("Network.loadingFinished", event => {
  const response = devToolsResponses.get(event.requestId);
  const encodedBodyLength =
    event.encodedDataLength - response.headersText.length;
  console.log(`${encodedBodyLength} bytes for ${response.url}`);
});

Result without setRequestInterception:

responseReceived [ '606', 361 ]
loadingFinished [ 967 ]
606 bytes for http://example.com/

Result with setRequestInterception:

responseReceived [ '606', 0 ]
loadingFinished [ 0 ]
-361 bytes for http://example.com/

Tested with multiple gzip tool. Same result everywhere.

The Content-Length Header is far more reliable in every sense.

这篇关于使用Puppeteer拦截请求时如何获取原始编码响应大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 10:56