问题描述
我正在开发一个较旧的经典 ASP 站点,并且有一个表单允许用户输入一些文本(进入多行文本框),如果他们添加一个 html 字符,如 ®(注册商标)正确插入.但是当他们去编辑数据时,使用相同的表格,更新会在注册商标前添加一个随机的Â"(抑扬符).内容类型为 utf-8.
I'm working on an older classic ASP site, and there's a form that allows the user to enter some text (into a multiline textbox), and if they add an html character like ® (register trademark) it inserts it correctly. But when they go to edit the data, using the same form, the update will add a random 'Â' (circumflex accent) in front of the registered trademark. The content type is utf-8.
有什么想法吗?
感谢您随时提供此信息.它一直让我发疯.-m
Thanks for any time you give this. It's been driving me nuts.-m
推荐答案
根本问题是Response.Codepage对Form Posts的影响.
The fundemental problem is the impact of Response.Codepage on Form Posts.
当您向客户端发送表单并指定内容编码为 UTF-8 时,浏览器将假定表单帖子的内容应以 UTF-8 编码发送.
When you send a form to a client specifying that the content is encoded as UTF-8, the browser will assume that the content of form posts should be sent encoded as UTF-8.
现在接收帖子的操作页面将(有点违反直觉)使用 Response.Codepage
的值来告知它帖子中的字符是如何编码的.这并不明显,因为我们倾向于认为发送者的工作是定义其发送内容的编码.此外,认为与我们想要在响应中发送的内容的编码有关的属性与初始请求的接收方式有任何关系,这也不是一个自然的飞跃.在这种情况下确实如此.
Now the action page that receives the post will (somewhat counter-intuatively) use the value of Response.Codepage
to inform it how the characters in the post are encoded. This isn't obvious because we tend to think its the job of the sender to define the encoding of what its sending. Also it isn't a natural leap to think that a property to do with the encoding of what we want to send in our response would have anything to do with how the initial a request is received. In this case it does.
发生的情况是您的表单正在发布字符的 UTF-8 编码版本,但接收的页面没有将其 Response.Codepage 设置为 65001(UTF-8 代码页).它可能设置为系统 OEM 代码页,如 1252.因此,字符的 UTF-8 编码被解释为两个单独的字符.
Whats happening is your form is posting a UTF-8 encoded version of the character but the page that receives does not have its Response.Codepage set to 65001 (the UTF-8 codepage). Its probably set to the systems OEM codepage like 1252. Hence the UTF-8 encoding for the character gets interpreted as two individual characters.
我对 ASP 中良好字符处理的建议是:-
My recommendations for good character handling in ASP are:-
- 将所有页面保存为 UTF-8
- 在所有页面的顶部包含 <%@ codepage=65001
- 在所有页面的顶部包含 <% Response.CharSet = "UTF-8" %>
- 将发布的数据存储在 Unicode 字段类型中,例如 SQL Servers NVARCHAR 类型.
这里重要的是,在您读取 ASP 页面中的表单值之前,您需要确保将 Response.Codepage 设置为与发件人编码匹配的代码页,并且这不会自动发生.
The important thing here is that before you read form values in an ASP page you need to make sure that the Response.Codepage is set to a codepage that matches the senders encoding and this doesn't happen automatically.
这篇关于经典的 ASP gremlims,每当使用 HTML 特殊字符时都会在文本中插入一个 Â的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!