问题描述
我是编码的新手,所以请耐心等待.我正在一个用户上载csv的系统上工作,我需要做的是显示内容,然后将其保存在数据库中.(utf-8编码)
有人要求我解决某些法语字符无法正确显示的问题.我几乎解决了这个问题,我正在显示
之类的字符 ÀàâÆÄääççÉéÈèÊêËëîÏïÔôœÖöÙÙÛûÜÜÿ
但是标题Ÿ
Œ
中提到的两个在网页上尚未正确显示.
到目前为止,这是我的php代码:
//在csv中说我们有ÖüÜߟÀàÂ"$ content = file_get_contents(addslashes($ file_name));var_dump($ content)//输出:string(54)" ߟ "if(!mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)){$ data = iconv('macintosh','UTF-8',$ content);}//处理已知的编码类型否则if(mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)=='ISO-8859-1'){//$ data = mb_convert_encoding($ content,'UTF-8',mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true));//不起作用$ data = iconv('ISO-8859-1','UTF-8',$ content);//不起作用}否则if(mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)=='UTF-8'){$数据= $内容}//如果我打印$ dataŸŒ"没有打印出来,它们迷失在某个地方//在这里做更多的事情
我正在处理的文件的编码类型为 ISO-8859-1
(当我打印出 mb_detect_encoding($ content,'UTF-8,ISO-8859-1',true)
,它显示 ISO-8859-1
).
是否有人对如何处理这种特殊情况有想法?
字符Ÿ和The在ISO-8859-1中无法表示.由于Windows-1252在某些代码位置保留了ISO-8859-1中的控制字符,因此似乎传入的数据实际上是Windows-1252(Windows拉丁语1)编码的,因为Windows-1252在某些代码位置具有图形字符,包括Ÿ和Œ.>
因此,您可能应该将Windows-1252添加到公认的编码列表中,并把公认的ISO-8859-1视为Windows-1252,即使用 iconv('windows-1252','UTF-8',$ content)
,即使ISO-8859-1已被识别为蜜蜂.错误标记为ISO-8859-1的Windows-1252数据非常常见.
I am new to encoding so please be patient.I am working on a system where a user upload a csv, what i need to do is to display the content and then save it in the database. (utf-8 encoding)
I have been asked to fix a issue with some french alphabet characters that weren't displayed correctly. I have almost solved the problem, I am displaying characters such as
ÀàÂâÆÄäÇçÉéÈèÊêËëÎîÏïÔôœÖöÙùÛûÜüÿ
However the two mentioned in the title Ÿ
Œ
are not displayed correctly yet on the webpage.
Here is my php code so far:
// say in the csv we have "ÖüÜߟÀàÂ"
$content = file_get_contents(addslashes($file_name));
var_dump($content) // output: string(54) "���ߟ��� "
if(!mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)){
$data = iconv('macintosh', 'UTF-8', $content);
}
// deal with known encoding types
else if(mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) == 'ISO-8859-1'){
//$data = mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); // does not work
$data = iconv('ISO-8859-1', 'UTF-8', $content); //does not work
}else if(mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true) == 'UTF-8'){
$data = $content
}
//if i print $data "Ÿ Œ " are not printed out... they got lost somewhere
//do more stuff here
the file I am dealing with has an encoding type of ISO-8859-1
(when i print out mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)
it displays ISO-8859-1
).
Is there anyone that have an idea on how to deal with this special cases?
The characters Ÿ and Œ are not representable in ISO-8859-1. It seems that the incoming data is actually windows-1252 (Windows Latin 1) encoded, since windows-1252 has graphic characters, including Ÿ and Œ, in some code positions that are reserved for control characters in ISO-8859-1.
So you should probably add windows-1252 to the list of recognized encodings and treat recognized ISO-8859-1 as windows-1252, i.e use iconv('windows-1252', 'UTF-8', $content)
even when ISO-8859-1 has bee recognized. Windows-1252 data mislabeled as ISO-8859-1 is very common.
这篇关于ŸŒcsv中的字符不显示php的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!