问题描述
我有一个字符串$ data,以utf-8编码.我假设我不知道此字符串是utf-8还是iso-8859-1.我想使用Perl Encode :: Guess模块来查看它是一个还是另一个.我在弄清楚该模块的工作方式时遇到了麻烦.
I have a string $data, encoded in utf-8. I assume that I don't know whether this string is utf-8 or iso-8859-1. I want to use the Perl Encode::Guess module to see if it's one or the other. I'm having trouble figuring out how this module works.
我尝试了以下四种方法(来自 http://perldoc.perl.org/Encode/Guess.html ):
I have tried the four following methods (from http://perldoc.perl.org/Encode/Guess.html) :
use Encode::Guess qw/utf8 latin1/;
my $decoder = guess_encoding($data);
print "$decoder\n";
结果:iso-8859-1或utf8
Result: iso-8859-1 or utf8
use Encode::Guess qw/utf8 latin1/;
my $enc = guess_encoding($data, qw/utf8 latin1/);
ref($enc) or die "Can't guess: $enc";
my $utf8 = $enc->decode($data);
print "$utf8\n";
结果:不能猜测:iso-8859-1或utf8在encodage-windows.pl第25行,第18110行.
Result: Can't guess: iso-8859-1 or utf8 at encodage-windows.pl line 25, line 18110.
use Encode::Guess qw/utf8 latin1/;
my $decoder = Encode::Guess->guess($data);
die $decoder unless ref($decoder);
my $utf8 = $decoder->decode($data);
print "$utf8\n";
结果:位于encodage-windows.pl第30行,第18110行的iso-8859-1或utf8.
Result: iso-8859-1 or utf8 at encodage-windows.pl line 30, line 18110.
use Encode::Guess qw/utf8 latin1/;
my $utf8 = Encode::decode("Guess", $data);
print "$utf8\n";
结果:位于/usr/local/lib/perl5/Encode.pm第175行的iso-8859-1或utf8.
Result: iso-8859-1 or utf8 at /usr/local/lib/perl5/Encode.pm line 175.
我的第一个问题是:我应该使用其中一种方法(如果有)?我的第二个问题:要使这项工作有效,我应该做出哪些更改?
My first question is: which one of these methods am I supposed to use (if any)?And my second question: what changes should I make to make this work?
推荐答案
我通常一次检查一次可能的编码,就像这样
I normally check the possible encodings one at a time, like this
my $decoder = guess_encoding($data, 'utf8');
$decoder = guess_encoding($data, 'iso-8859-1') unless ref $decoder;
die $decoder unless ref $decoder;
printf "Decoding as %s\n\n", $decoder->name;
$data = $decoder->decode($data);
如果可能的话,它选择UTF-8,否则尝试ISO-8859-1,然后选择它或错误,因此对于每种编码它都变成简单的是/否结果,并且没有办法提出两个可能的结果(这就是您得到的错误).
If possible it chooses UTF-8, otherwise it tries ISO-8859-1, and either chooses that or errors, so it becomes a simple yes/no result for each encoding and there is no way for it to come up with two possible results (which is the error you're getting).
这篇关于Encode :: Guess可以告诉iso-8859-1中的utf-8吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!