问题描述
我只是想更好地理解字符编码,所以我在做一些测试.
I'm just trying to understand character encoding a bit better, so I'm doing a few tests.
我有一个保存为UTF-8的PHP文件,如下所示:
I have a PHP file that is saved as UTF-8 and looks like this:
<?php
declare(encoding='UTF-8');
header( 'Content-type: text/html; charset=utf-8' );
?><!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8" />
<title>Test</title>
</head>
<body>
<?php echo "\xBD"; # Does not work ?>
<?php echo htmlentities( "\xBD" ) ; # Works ?>
</body>
</html>
页面本身显示以下内容:
The page itself shows this:
问题的要点是我的Web应用程序存在许多字符编码问题,人们在其中从Outlook或Word复制和粘贴,并且字符被转换为菱形问号(这些字符具有真实名称吗?)
The gist of the problem is that my web application has a bunch of character encoding problems, where people are copying and pasting from Outlook or Word and the characters get transformed into the diamond question marks (Do those have a real name?)
我正在尝试学习如何确保在页面加载时(基本上是$_GET
,$_POST
和$_REQUEST
),所有输入都转换为UTF-8,并且所有输出都使用正确的UTF完成-8处理方法.
I'm trying to learn how to make sure all my input is transformed into UTF-8 when the page loads (Basically $_GET
, $_POST
, and $_REQUEST
), and all output is done using proper UTF-8 handling methods.
我的问题是:为什么我的页面上显示第一个回声的问号,还有人关于使用PHP制作UTF-8安全的Web应用程序有其他信息吗?
My question is: Why is my page showing the question mark for the first echo, and does anyone have any other information about making a UTF-8 safe web app in PHP?
推荐答案
0xBD无效的UTF-8.如果要在UTF-8中编码"1/2",则需要改用0xC2 0xBD.
0xBD is not valid UTF-8. If you want to encode "½" in UTF-8 then you need to use 0xC2 0xBD instead.
>>> print '\xc2\xbd'.decode('utf-8')
½
如果要使用其他字符集(在本例中为Latin-1)中的文本,则需要先使用各种iconv或mb函数将其转码为UTF-8.
If you want to use text from another charset (Latin-1 in this case) then you need to transcode it to UTF-8 first using the various iconv or mb functions.
也:
$ charinfo �
U+FFFD REPLACEMENT CHARACTER
这篇关于字符编码失败,为什么\ xBD在PHP + HTML中无法正确显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!