问题描述
我想写一个函数,将包含unicode字符的字符串转换为一些默认的ASCII转录。理想情况下,我想Ångström
成为 Angstroem
或如果不可能, Angstrom
。同样α=χ
应该变成 a = x
(c?)或类似的。
I am trying to write a function, that translates a string containing unicode characters into some default ASCII transcription. Ideally I'd like e.g. Ångström
to become Angstroem
or, if that is not possible, Angstrom
. Likewise α=χ
should become a=x
(c?) or similar.
Emacs有这样的内置功能吗?我知道我可以得到的名字和类似的字符( get-char-code-property
),但我知道没有内置的转录表。
Does Emacs have such built-in capabilities? I know I can get the names and similar of characters (get-char-code-property
) but I know no built-in transcription table.
目的是将条目标题翻译成有意义的可读文件名,避免出现不能理解unicode的软件的问题。
The purpose is to translate titles of entries into meaningfully readable filenames, avoiding problems with software that doesn't understand unicode.
我当前的策略
推荐答案
有一种方法可以用来建立一个翻译表,但是这种方法相当有限,需要大量的维护。没有内置的能力,我知道。我写了一个包专门为您的任务。它使用的方法与库中的方法相同。要安装只是将MELPA存储库添加到您的存储库列表:
There is no built-in capability that i know of. I wrote a package unidecode
specifically for your task. It uses the same approach as in Python's same-named library. To install just add MELPA repository to your repository list:
(add-to-list 'package-archives
'("melpa" . "http://melpa.milkbox.net/packages/") t)
然后运行。 unidecode
有两个函数, unidecode-unidecode
可将Unicode转换为ASCII和 unidecode-sanitize
会舍弃非字母数字字符并将空格转换为连字符。
Then run . unidecode
has 2 functions, unidecode-unidecode
that turns Unicode into ASCII, and unidecode-sanitize
that discards non-alphanumeric characters and transforms space into hyphen.
ELISP> (unidecode-unidecode "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"!Hola!, Gruss Gott, Hyvaa paivaa, Tere ohtust, Bongu Czesc!, Dobry den, Zdravstvuite!, Geia sas, lmsllmlllmckhmslmgll"
ELISP> (unidecode-sanitize "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"hola-gruss-gott-hyvaa-paivaa-tere-ohtust-bongu-czesc-dobry-den-zdravstvuite-geia-sas-lmsllmlllmckhmslmgll"
这篇关于Emacs lisp:将字符翻译成标准ASCII转录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!