问题描述
我写了一些注释的PHP类,我偶然发现了一个问题。我的名字(对于@author标签)最后是一个ş
(这是一个UTF-8字符,...和一个奇怪的名字,我知道) p>
即使我将文件保存为UTF-8,一些朋友报告他们看到这个字符完全搞乱了(È™
)。通过添加BOM签名,此问题消失。但是这件事让我烦恼了一点,因为我不知道那么多,除了我在维基百科和一些其他类似的问题在这里看到的SO。
我知道它在文件的开头添加了一些东西,从我的理解,这不是很糟糕,但我担心,因为我读到的唯一有问题的情况涉及PHP文件。因为我编写PHP类来共享它们,100%兼容比在评论中使用我的名字更重要。
但我想了解意义,我应该使用它而不用担心?或有可能造成损害的情况?什么时候?
实际上,BOM是发送到浏览器的实际数据。浏览器会高兴地忽略它,但仍然不能发送标题。
我相信问题真的是你和你朋友的编辑器设置。没有BOM,您朋友的编辑器可能不会自动将该文件识别为UTF-8。他可以尝试设置他的编辑器,使编辑器期望一个文件是UTF-8(如果你使用一个真正的IDE,如NetBeans,那么甚至可以做一个项目设置,你可以随代码一起传输)。
另一种方法是尝试一些技巧:一些编辑器尝试使用一些基于输入文本的启发式来确定编码。您可以尝试使用
开始每个文件 <?php //Úτƒ-8 encoded
也许启发式方法会得到它。可能有更好的东西放在那里,你可以google的什么样的编码检测启发式是常见的,或者只是尝试一些: - )
总而言之,我建议只是修复编辑器设置。
哦,等等,我误解了最后一部分:为了将代码扩展到任何地方,我想你最安全的只是使所有的文件包含较低的7位字符,即纯ASCII,或者只接受一些有古代编辑的人看到你的名字写得滑稽。没有故障安全的方式。 BOM肯定是坏的,因为头已经发送的东西。另一方面,只要你在注释中只放置UTF-8字符等等,一些编辑误解编码的唯一影响是奇怪的字符。我会去正确拼写你的名字,并添加一个针对启发式的评论,以便大多数编辑将得到它,但总是有人会看到虚假的字符。
I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ș
(which is a UTF-8 character, ...and a strange name, I know).
Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™
). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO.
I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments.
But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When?
Indeed, the BOM is actual data sent to the browser. The browser will happily ignore it, but still you cannot send headers then.
I believe the problem really is your and your friend's editor settings. Without a BOM, your friend's editor may not automatically recognize the file as UTF-8. He can try to set up his editor such that the editor expects a file to be in UTF-8 (if you use a real IDE such as NetBeans, then this can even be made a project setting that you can transfer along with the code).
An alternative is to try some tricks: some editors try to determine the encoding using some heuristics based on the entered text. You could try to start each file with
<?php //Úτƒ-8 encoded
and maybe the heuristic will get it. There's probably better stuff to put there, and you can either google for what kind of encoding detection heuristics are common, or just try some out :-)
All in all, I recommend just fixing the editor settings.
Oh wait, I misread the last part: for spreading the code to anywhere, I guess you're safest just making all files only contain the lower 7-bit characters, i.e. plain ASCII, or to just accept that some people with ancient editors see your name written funny. There is no fail-safe way. The BOM is definitely bad because of the headers already sent thing. On the other side, as long as you only put UTF-8 characters in comments and so, the only impact of some editor misunderstanding the encoding is weird characters. I'd go for correctly spelling your name and adding a comment targeted at heuristics so that most editors will get it, but there will always be people who'll see bogus chars instead.
这篇关于UTF-8 BOM签名在PHP文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!