问题描述
我目前正在使用代理服务器,在这种情况下,我们必须修改数据(通过使用regexp),我们推送它。
I'm currently working on a proxy server where we in this case have to modify the data (by using regexp) that we push through it.
在大多数case工作正常除了使用gzip作为内容编码的网站(我想),我遇到了一个模块,并试图推送通过解压缩/ gunzip流接收的块,但它不是真的出来,因为我的预期。 (见下面的代码)
In most cases it works fine except for websites that use gzip as content-encoding (I think), I've come across a module called compress and tried to push the chunks that I receive through a decompress / gunzip stream but it isn't really turning out as I expected. (see below for code)
想我会发布一些代码来支持我的问题,这是代理,加载mvc(快递):
figured i'd post some code to support my prob, this is the proxy that gets loaded with mvc (express):
module.exports = {
index: function(request, response){
var iframe_url = "www.nu.nl"; // site with gzip encoding
var http = require('http');
var httpClient = http.createClient(80, iframe_url);
var headers = request.headers;
headers.host = iframe_url;
var remoteRequest = httpClient.request(request.method, request.url, headers);
request.on('data', function(chunk) {
remoteRequest.write(chunk);
});
request.on('end', function() {
remoteRequest.end();
});
remoteRequest.on('response', function (remoteResponse){
var body_regexp = new RegExp("<head>"); // regex to find first head tag
var href_regexp = new RegExp('\<a href="(.*)"', 'g'); // regex to find hrefs
response.writeHead(remoteResponse.statusCode, remoteResponse.headers);
remoteResponse.on('data', function (chunk) {
var body = doDecompress(new compress.GunzipStream(), chunk);
body = body.replace(body_regexp, "<head><base href=\"http://"+ iframe_url +"/\">");
body = body.replace(href_regexp, '<a href="#" onclick="javascript:return false;"');
response.write(body, 'binary');
});
remoteResponse.on('end', function() {
response.end();
});
});
}
};
在var body部分我想读取正文,例如在这种情况下删除所有hrefs用#替换它们。这里的问题当然是当我们有一个网站gzip编码/压缩它是所有的乱七八糟,我们不能应用正则表达式。
at the var body part i want to read the body and for example in this case remove all hrefs by replacing them with an #. The problem here of course is when we have an site which is gzip encoded/ compressed it's all jibberish and we can't apply the regexps.
现在我已经累了使用node-compress模块:
now I've already tired to mess around with the node-compress module:
doDecompress(new compress.GunzipStream(), chunk);
指
function doDecompress(decompressor, input) {
var d1 = input.substr(0, 25);
var d2 = input.substr(25);
sys.puts('Making decompression requests...');
var output = '';
decompressor.setInputEncoding('binary');
decompressor.setEncoding('utf8');
decompressor.addListener('data', function(data) {
output += data;
}).addListener('error', function(err) {
throw err;
}).addListener('end', function() {
sys.puts('Decompressed length: ' + output.length);
sys.puts('Raw data: ' + output);
});
decompressor.write(d1);
decompressor.write(d2);
decompressor.close();
sys.puts('Requests done.');
}
但它失败了,因为chunk输入是一个对象,所以我试过提供它作为一个chunk.toString(),它也失败与无效的输入数据。
But it fails on it since the chunk input is an object, so i tried supplying it as an chunk.toString() which also fails with invalid input data.
我想知道我是否正在朝正确的方向?
I was wondering if I am at all heading in the right direction?
推荐答案
解压缩器需要二进制编码输入。您的响应接收的块是的一个实例, toString ()
方法默认给你一个UTF-8编码的字符串。
The decompressor expects binary encoded input. The chunk that your response receives is an instance of Buffer which toString()
method does by default give you an UTF-8 encoded string back.
所以你必须使用 .toString('binary')
以使它工作,这也可以在。
So you have to use chunk.toString('binary')
to make it work, this can also be seen in the demo.
这篇关于Node.js代理,处理gzip解压缩的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!