本文介绍了Perl:将(高)十进制NCR转换为UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个字符串(十进制NCR):日本の鍼灸とは

I have this string (Decimal NCRs): 日本の鍼灸とは

它代表日语文本日本​​の针灸とは.

但是我需要(UTF-8):%E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF

But I need (UTF-8): %E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF

对于第一个字符:日 %E6%97%A5

For the first character: 日%E6%97%A5

此站点可以做到,但是如何在Perl中获得它呢?(如果可能的话,可以使用单个正则表达式,例如 s/\& \#([0-9] +);/uc('%'.unpack("H2",pack("c",$ 1)))/eg; .)

This site does it, but how do I get this in Perl? (If possible in a single regex like s/\&\#([0-9]+);/uc('%'.unpack("H2", pack("c", $1)))/eg;.)

http://www.endmemo.com/unicode/unicodeconverter.php

我还需要再次将其从UTF-8转换回十进制NCR

Also I need to convert it back again from UTF-8 to Decimal NCRs

现在,我已经为此努力了半天,任何帮助都将不胜感激!

I've been breaking my head over this one for half a day now, any help is greatly appreciated!

推荐答案

#!/usr/bin/perl
use strict;
use warnings;

use Test::More tests => 2;
use Encode qw{ encode decode };

my $in = '日本の鍼灸とは'; # 日本の鍼灸とは
my $out = '%E6%97%A5%E6%9C%AC%E3%81%AE%E9%8D%BC%E7%81%B8%E3%81%A8%E3%81%AF';

(my $utf = $in) =~ s/&#(.*?);/chr $1/ge;

my $r = join q(), map { sprintf '%%%2X', ord } split //, encode('utf8', $utf);
is($r, $out);

(my $s = $r) =~ s/%(..)/chr hex $1/ge;
$s = decode('utf8', $s);
$s = join q(), map '&#' . ord . ';', split //, $s;
is($s, $in);

这篇关于Perl:将(高)十进制NCR转换为UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-13 14:49