本文介绍了在Perl中进行编码检测的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下两个字符串:

%EC%E0%EC%E0+%EC%FB%EB%E0+%F0%E0%EC%F3
%D0%BC%D0%B0%D0%BC%D0%B0%20%D0%BC%D1%8B%D0%BB%D0%B0%20%D1%80%D0%B0%D0%BC%D1%83

这是分别在cp-1251和utf-8中用url编码的俄语短语.我想在Perf的utf-8终端中以俄语查看它们.不幸的是,perl模块Encode :: Detect(在URL解码之后)无法检测到第一个示例的cp-1251.相反,它提出了这样的建议:"x-euc-tw".

This is a url-encoded phrase in Russian in cp-1251 and utf-8 respectively. I want to see them in Russian in my utf-8 terminal using perl.Unfortunately, perl module Encode::Detect (after url-decoding) can't detect cp-1251 of the first example. Instead, it proposes this: "x-euc-tw".

问题是,在这种情况下检测正确编码的正确方法是什么(指定语言环境参数,使用其他模块...)?

The question is, what is the proper way of detecting the right encoding in this case (specifying locale parameters, using other modules...)?

推荐答案

UTF-8和cp1251是仅有的两个选项吗?使用同时也是UTF-8的cp1251文本的几率很小. (那会很乱.)所以你可以做

Are UTF-8 and cp1251 the only two options? The odds of having cp1251 text that's also valid UTF-8 is extremely tiny. (It would be gibberish.) So you can do

use Encode qw( decode );
my $decoded = eval { decode('UTF-8', $encoded, Encode::FB_CROAK) }
    // decode('cp1251', $encoded);

这将比编码猜测器准确得多.

This will be far far more accurate that an encoding guesser.

这篇关于在Perl中进行编码检测的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 18:33