问题描述
我像这样用Cheerio解析我的请求:
I parse my request with Cheerio like this:
var url = http://shop.nag.ru/catalog/16939.IP-videonablyudenie-OMNY/16944.IP-kamery-OMNY-c-vario-obektivom/16704.OMNY-1000-PRO;
request.get(url, function (err, response, body) {
console.log(body);
$ = cheerio.load(body);
console.log($(".description").html());
});
作为输出,我看到了内容,但使用了不可读的奇怪编码:
And as output I see content but in unreadable strange encoding:
//Plain body console.log(body) (p.s. russian chars):
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1><p style
// cheerio's console.log $(".description").html()
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY
目标网址链接编码为UTF-8格式。那么,为什么Cheerio破坏了我的编码?
Target url link coding is in UTF-8 format. So why Cheerio breaks my encoding?
尝试使用iconv编码我的身体反应:
Trying to use iconv to encode my body responce:
var body1 = iconv.decode(body, "utf-8");
但 console.log($(。description)。html( ));
仍返回奇怪的文本。
推荐答案
Cheerio并未破坏任何内容。它输出的是,它将由任何浏览器完全相同地呈现作为HTML输入。运行以下代码片段以了解我的意思:
Cheerio hasn't broken anything. It's outputting HTML entities, which will be rendered by any browser exactly the same as the HTML input. Run this snippet to see what I mean:
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>
<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>
&#x423;
例如,字符<$ c编码为HTML实体的$ c>У,以& gt;
实体表示>
。
У
, for example, is the character У
encoded as an HTML entity, in the same way the entity >
represents >
.
但是,如果要获取未编码的文本,可以设置 decodeEntities
false
的选项:
However, if you want to get the unencoded text, you can set the decodeEntities
option to false
:
const $ = cheerio.load(
`<h1><span style="font-size: 16px;">Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше</span></h1>`,
{ decodeEntities: false }
);
console.log($('span').html())
// => Уличная 3Мп IP HD камера OMNY - попробуйте найти лучше
.as-console-wrapper{min-height:100%}
<script src="https://bundle.run/[email protected]"></script>
这篇关于Node.js Cheerio解析器中断UTF-8编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!