我怎样才能获得可以在 \N{} 中使用的所有值来生成特定的代码点?

本文介绍了我怎样才能获得可以在 \N{} 中使用的所有值来生成特定的代码点?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在针对这个问题进行调试时，我问自己:如何找到所有值我可以在 \N{} 中使用给定的 Unicode 代码点吗?

While performing debugging for this question, I asked myself: How do I find all values that I can use in \N{} for a given Unicode code point?

例如，我想知道 U+03B1(希腊小写字母 ALPHA)的所有别名.我怎么会发现 \N{greek:alpha} 可以用于这个?

For example, I want to know all aliases for U+03B1 (GREEK SMALL LETTER ALPHA). How would I find out that \N{greek:alpha} could be used for this?

推荐答案

没有单独的列表来检查这些.

There's no single list against which these are checked.

基于 \N{} 的文档，以下将解决问题:

Based on the documentation of \N{}, the following will do the trick:

use List::Util   qw( max );
use Unicode::UCD qw( charscripts charinfo charprop );

my $re_scripts = join '|', map { quotemeta uc s/_/ /gr } keys %{ charscripts() };
my $re_letter = qr/^($re_scripts) (?:(CAPITAL|SMALL) )?LETTER (\S.*)/;

{
   @ARGV == 1
     or die("usage\n");

   my $ucp = hex( $ARGV[0] =~ s/^(?:U\+|0x)//r );

   my @names;
   push @names, [ "", sprintf('U+%X', $ucp) ];

   if ( my $charinfo = charinfo($ucp) ) {
      my $name = $charinfo->{name};
      push @names, [ ":full", $name ] if length($name) && $name ne '<control>';

      for my $alias (map s/:.*//sr, split /,/, charprop($ucp, 'Name_Alias')) {
         push @names, [ ":full", $alias ];
      }

      if ( my ($script_name, $type, $short_char_name) = $name =~ $re_letter ) {
         my $uc = ( $type // 'CAPITAL' ) eq 'CAPITAL';
         my $lc = ( $type // 'SMALL'   ) eq 'SMALL';
         push @names, [ ":short", join(":", $script_name, uc($short_char_name)) ] if $uc;
         push @names, [ ":short", join(":", $script_name, lc($short_char_name)) ] if $lc;
         push @names, [ $script_name, uc($short_char_name) ] if $uc;
         push @names, [ $script_name, lc($short_char_name) ] if $lc;
      }
   }

   my $longuest = max map length($_->[0]), @names;
   say sprintf("use charnames qw( %-*s ); \"\\N{%s}\"", $longuest, @$_) for @names;
}

例如

$ ./script.pl U+03B1
use charnames qw(        ); "\N{U+3B1}"
use charnames qw( :full  ); "\N{GREEK SMALL LETTER ALPHA}"
use charnames qw( :short ); "\N{GREEK:alpha}"
use charnames qw( GREEK  ); "\N{alpha}"

$ ./script.pl U+0391
use charnames qw(        ); "\N{U+391}"
use charnames qw( :full  ); "\N{GREEK CAPITAL LETTER ALPHA}"
use charnames qw( :short ); "\N{GREEK:ALPHA}"
use charnames qw( GREEK  ); "\N{ALPHA}"

$ perl a.pl 1C00
use charnames qw(        ); "\N{U+1C00}"
use charnames qw( :full  ); "\N{LEPCHA LETTER KA}"
use charnames qw( :short ); "\N{LEPCHA:KA}"
use charnames qw( :short ); "\N{LEPCHA:ka}"
use charnames qw( LEPCHA ); "\N{KA}"
use charnames qw( LEPCHA ); "\N{ka}"

$ ./script.pl 20
use charnames qw(       ); "\N{U+20}"
use charnames qw( :full ); "\N{SPACE}"
use charnames qw( :full ); "\N{SP}"

注意事项:

charnames.pm 导入参数中的脚本名称不区分大小写.
use charnames qw( ); 的实例在输出中(即加载不带参数的 charnames.pm 的指令)实际上并不是必需的.
自 Perl 5.16 起，如果在遇到 \N{} 之前以其他方式加载，则使用 use charnames qw( :full :short ); 隐式加载 charnames.pm.
未列出有效的自定义别名.(从技术上讲，除非您修改脚本，否则没有任何内容.)
名称必须与输出完全相同，但以下情况除外:
- U+ 后面的数字不区分大小写.
- U+ 后面的数字可能有前导零.
- :short 名称中的脚本名称不区分大小写.
- :short 中的大写字符名称和脚本字符名称不区分大小写，但必须至少包含一个大写字符.
- 使用 use charnames qw( :loose ); 允许显示字符串的进一步变化.
- Script names in charnames.pm import parameters are case-insensitive.
- Instances of use charnames qw( ); in the output (i.e. directives loading charnames.pm with no parameters) are not actually necessary.
- Since Perl 5.16, charnames.pm is implicitly loaded using use charnames qw( :full :short ); if it's otherwise loaded before a \N{} is encountered.
- Custom aliases in effect are not listed. (Technically, there aren't any unless you modify the script.)
- The names must be provided exactly as output, with the following exceptions:
  - The number that follows U+ is case-insensitive.
  - The number that follows U+ may have leading zeroes.
  - The script name in :short names is case-insensitive.
  - Upper-case character names in :short and in script character names are case-insensitive, but must contain at least one upper-case character.
  - Using use charnames qw( :loose ); allows further variations of the displayed strings.
  这篇关于我怎样才能获得可以在 \N{} 中使用的所有值来生成特定的代码点?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！