Node.js代理，处理gzip解压缩

本文介绍了Node.js代理，处理gzip解压缩的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在使用代理服务器，在这种情况下，我们必须修改数据（通过使用regexp），我们推送它。

I'm currently working on a proxy server where we in this case have to modify the data (by using regexp) that we push through it.

在大多数case工作正常除了使用gzip作为内容编码的网站（我想），我遇到了一个模块，并试图推送通过解压缩/ gunzip流接收的块，但它不是真的出来，因为我的预期。（见下面的代码）

In most cases it works fine except for websites that use gzip as content-encoding (I think), I've come across a module called compress and tried to push the chunks that I receive through a decompress / gunzip stream but it isn't really turning out as I expected. (see below for code)

想我会发布一些代码来支持我的问题，这是代理，加载mvc（快递）：

figured i'd post some code to support my prob, this is the proxy that gets loaded with mvc (express):

module.exports = {
index: function(request, response){
    var iframe_url = "www.nu.nl"; // site with gzip encoding

    var http = require('http');
    var httpClient = http.createClient(80, iframe_url);
    var headers = request.headers;
    headers.host = iframe_url;

    var remoteRequest = httpClient.request(request.method, request.url, headers);

    request.on('data', function(chunk) {
        remoteRequest.write(chunk);
    });

    request.on('end', function() {
        remoteRequest.end();
    });

    remoteRequest.on('response', function (remoteResponse){
        var body_regexp = new RegExp("<head>"); // regex to find first head tag
        var href_regexp = new RegExp('\<a href="(.*)"', 'g'); // regex to find hrefs

        response.writeHead(remoteResponse.statusCode, remoteResponse.headers);

        remoteResponse.on('data', function (chunk) {
    var body = doDecompress(new compress.GunzipStream(), chunk);
            body = body.replace(body_regexp, "<head><base href=\"http://"+ iframe_url +"/\">");
            body = body.replace(href_regexp, '<a href="#" onclick="javascript:return false;"');

            response.write(body, 'binary');
        });

        remoteResponse.on('end', function() {

            response.end();
            });
        });
    }
};

在var body部分我想读取正文，例如在这种情况下删除所有hrefs用＃替换它们。这里的问题当然是当我们有一个网站gzip编码/压缩它是所有的乱七八糟，我们不能应用正则表达式。

at the var body part i want to read the body and for example in this case remove all hrefs by replacing them with an #. The problem here of course is when we have an site which is gzip encoded/ compressed it's all jibberish and we can't apply the regexps.

现在我已经累了使用node-compress模块：

now I've already tired to mess around with the node-compress module:

 doDecompress(new compress.GunzipStream(), chunk);

指

function doDecompress(decompressor, input) {
  var d1 = input.substr(0, 25);
  var d2 = input.substr(25);

  sys.puts('Making decompression requests...');
  var output = '';
  decompressor.setInputEncoding('binary');
  decompressor.setEncoding('utf8');
  decompressor.addListener('data', function(data) {
    output += data;
  }).addListener('error', function(err) {
    throw err;
  }).addListener('end', function() {
    sys.puts('Decompressed length: ' + output.length);
    sys.puts('Raw data: ' + output);
  });
  decompressor.write(d1);
  decompressor.write(d2);
  decompressor.close();
  sys.puts('Requests done.');
}

但它失败了，因为chunk输入是一个对象，所以我试过提供它作为一个chunk.toString（），它也失败与无效的输入数据。

But it fails on it since the chunk input is an object, so i tried supplying it as an chunk.toString() which also fails with invalid input data.

我想知道我是否正在朝正确的方向？

I was wondering if I am at all heading in the right direction?

js代理

Node.js代理，处理gzip解压缩

问题描述

推荐答案