问题描述
我有一个在非UTF-8中使用的表格(实际上是在Windows-1251中).人们当然会将他们喜欢的任何字符张贴在那里.浏览器可以将Windows-1251中无法显示的字符转换为html实体,因此我仍然可以识别它们.例如,如果用户键入一个→,我将收到一个→
.相当不错,例如,如果我将其回显,那么无论如何浏览器都会正确显示→.
I have a form served in non-UTF-8 (it’s actually in Windows-1251). People, of course, post there any characters they like to. The browser helpfully converts the unpresentable-in-Windows-1251 characters to html entities so I can still recognise them. For example, if user types an →, I receive an →
. That’s partially great, like, if I just echo it back, the browser will correctly display the → no matter what.
问题是,我实际上在显示文本之前对文本做了htmlspecialchars()(这是一个PHP函数,用于将特殊字符转换为HTML实体,例如&变为&
).我的用户有时会键入—
或©
之类的内容,而我想将其显示为实际的—
或©
,而不是—和©.
The problem is, I actually do a htmlspecialchars () on the text before displaying it (it’s a PHP function to convert special characters to HTML entities, e.g. & becomes &
). My users sometimes type things like —
or ©
, and I want to display them as actual —
or ©
, not — and ©.
我无法将→与→
区分开,因为我都将它们都称为→
.而且,由于我对文本和进行了htmlspecialchars()的处理,因此我也从浏览器中获得了→的→
,所以我回显了→
,该→
在浏览器中显示为→
.因此,用户的输入将被破坏.
There’s no way for me to distinguish an → from →
, because I get them both as →
. And, since I htmlspecialchars () the text, and I also get a →
for a → from browser, I echo back an →
which gets displayed as →
in a browser. So the user’s input gets corrupted.
有没有办法说:好吧,我在Windows-1251中提供此表格,但您请只将输入内容以UTF-8格式发送给我,让我自己处理" ?
Is there a way to say: "Okay, I serve this form in Windows-1251, but will you please just send me the input in UTF-8 and let me deal with it myself"?
哦,我知道好主意是将整个软件切换到UTF-8,但这工作太多了,我很乐意对此进行快速修复.如果这很重要,则表单的enctype为"multipart/form-data"(包括文件上传器,因此不能使用任何其他enctype).我使用Apache和PHP.
Oh, I know that the good idea is to switch the whole software to UTF-8, but that is just too much work, and I would be happy to get a quick fix for this. If this matters, the form’s enctype is "multipart/form-data" (includes file uploader, so cannot use any other enctype). I use Apache and PHP.
谢谢!
推荐答案
好吧,差不多了,除了它根本没有帮助.现在您无法分辨出真正的ƛ"之间的区别有人打来,希望它能以包含&"和Б"字符的文本字符串的形式出现.
Well, nearly, except that it's not at all helpful. Now you can't tell the difference between a real "ƛ" that someone typed expecting it to come out as a string of text with a ‘&’ in it, and a ‘Б’ character.
是的.您必须这样做,否则会遇到安全问题.
Yes. You must do that, or else you've got a security problem.
是的,据说您在表单标签中发送了"accept-charset ="UTF-8"".但是现实是,这在IE中不起作用.要以UTF-8获取表单,您必须以UTF-8发送表单(页面).
Yeah, supposedly you send "accept-charset="UTF-8"" in the form tag. But the reality is that doesn't work in IE. To get a form in UTF-8, you must send a form (page) in UTF-8.
是的.好吧,至少包含表单的页面的编码应该是UTF-8.
Yup. Well, at least the encoding of the page containing the form should be UTF-8.
这篇关于在PHP中以UTF-8格式获取非UTF-8格式的字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!