Header忽略HTML源代码中meta标记中的字符集设置

Header忽略HTML源代码中meta标记中的字符集设置

本文介绍了HttpResponseMessage.Content.Header忽略HTML源代码中meta标记中的字符集设置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚发布了问题,该问题的答案立即出现.反过来,它创建了以下新问题:

I have just posted this question, which answer came right away.It, in turn, creates the following new question:

如果我的理解是正确的,则在通过HttpClient.GetAsync发出HTTP请求时,将创建HttpResponseMessage中的StreamContent对象.它的Header属性(或其中的一部分)将根据HTML源文件中包含的meta标签进行设置.

If my understanding is correct, the StreamContent object, from HttpResponseMessage, is created upon making an HTTP request via HttpClient.GetAsync. Its Header property, or part of it, will be set according to meta tags included in the HTML source file.

例如,一个meta标签可以告诉响应对象使用哪个字符集对文件内容进行编码.

For instance, a meta tag can tell the response object with which charset encode the file's contents.

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

对包含此类行的资源运行请求将使用此设置生成HttpResponseMessage.Content.Header.

Running a request to a resource that contains such line will generate a HttpResponseMessage.Content.Header with this setting.

在此问题顶部引用的另一个问题中,我提到了创建的响应对象没有正确的编码.由于生成此类不兼容响应的HTML源代码确实包含负责创建正确编码响应的设置:

In the other question referenced at the top of this question, I mention about a response object being created without the correct encoding. Since the HTML source that generates such incompatible response does contain the setting that is responsible for creating responses properly encoded:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">

该网站的响应未通过meta标记中包含的字符集设置,并因此以不正确的字符集呈现的原因是什么?

what is the reason that responses for that site are not being passed the charset setting included in the meta tag and thus being rendered in an incorrect charset?

以下是该问题的图形描述:这两个站点都包含带有字符集设置的meta标记,但是由于某种原因,其中一个会丢失它...

Here's a pictorial description of the question:both sites contain the meta tag with charset setting, but one, for some reason, misses it...

两个请求的提琴手的标题详细信息:

Fiddler's header details for both requests:

正在工作的人:(已删除Cookie标头)

Working one:(removed cookie header)

请求:

GET http://www.ynet.co.il/home/0,7340,L-8,00.html HTTP/1.1
Host: www.ynet.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
If-Modified-Since: Thu, 31 Mar 2016 10:04:39 GMT

回复:

HTTP/1.1 200 OK
vg_id: 1
X-me: 06
Content-Type: text/html; charset=UTF-8
Last-Modified: Thu, 31 Mar 2016 10:38:57 GMT
Accept-Ranges: bytes
VX-Cache: HIT
WAI: 01
V-TTL: 0
backend-cache-control:
Content-Length: 410685
Vary: Accept-Encoding
Date: Thu, 31 Mar 2016 10:38:48 GMT
Connection: keep-alive

问题之一:

请求:

GET http://winedepot.co.il/ HTTP/1.1
Host: winedepot.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=201832727.725995063.1458660502.1459413977.1459418530.8; __utmz=201832727.1458660502.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmc=201832727; ASPSESSIONIDCQTRQCAQ=FEOHEBFCBGABBKOBAHOGKBGB
Connection: keep-alive

回复:

HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 118225
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 31 Mar 2016 10:36:21 GMT

推荐答案

从Fiddler屏幕截图中可以看到,HttpResponseMessage.Content.Headers.ContentType将完全包含响应的Content-type标头中指定的内容.

As you can see from your Fiddler screenshots, the HttpResponseMessage.Content.Headers.ContentType will contain exactly what was specified in the Content-type header of the response.

HttpResponseMessage解析响应HTML并搜索任何<meta />标签.

The HttpResponseMessage will not parse the response HTML and search for any <meta /> tags.

这篇关于HttpResponseMessage.Content.Header忽略HTML源代码中meta标记中的字符集设置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-22 22:49