将字符翻译成标准ASCII转录

将字符翻译成标准ASCII转录

本文介绍了Emacs lisp:将字符翻译成标准ASCII转录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个函数,将包含unicode字符的字符串转换为一些默认的ASCII转录。理想情况下,我想Ångström成为 Angstroem 或如果不可能, Angstrom 。同样α=χ应该变成 a = x (c?)或类似的。

I am trying to write a function, that translates a string containing unicode characters into some default ASCII transcription. Ideally I'd like e.g. Ångström to become Angstroem or, if that is not possible, Angstrom. Likewise α=χ should become a=x (c?) or similar.

Emacs有这样的内置功能吗?我知道我可以得到的名字和类似的字符( get-char-code-property ),但我知道没有内置的转录表。

Does Emacs have such built-in capabilities? I know I can get the names and similar of characters (get-char-code-property) but I know no built-in transcription table.

目的是将条目标题翻译成有意义的可读文件名,避免出现不能理解unicode的软件的问题。

The purpose is to translate titles of entries into meaningfully readable filenames, avoiding problems with software that doesn't understand unicode.

我当前的策略

推荐答案

有一种方法可以用来建立一个翻译表,但是这种方法相当有限,需要大量的维护。没有内置的能力,我知道。我写了一个包专门为您的任务。它使用的方法与库中的方法相同。要安装只是将MELPA存储库添加到您的存储库列表:

There is no built-in capability that i know of. I wrote a package unidecode specifically for your task. It uses the same approach as in Python's same-named library. To install just add MELPA repository to your repository list:

(add-to-list 'package-archives
  '("melpa" . "http://melpa.milkbox.net/packages/") t)

然后运行。 unidecode 有两个函数, unidecode-unidecode 可将Unicode转换为ASCII和 unidecode-sanitize 会舍弃非字母数字字符并将空格转换为连字符。

Then run . unidecode has 2 functions, unidecode-unidecode that turns Unicode into ASCII, and unidecode-sanitize that discards non-alphanumeric characters and transforms space into hyphen.

ELISP> (unidecode-unidecode "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"!Hola!, Gruss Gott, Hyvaa paivaa, Tere ohtust, Bongu Czesc!, Dobry den, Zdravstvuite!, Geia sas, lmsllmlllmckhmslmgll"
ELISP> (unidecode-sanitize "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"hola-gruss-gott-hyvaa-paivaa-tere-ohtust-bongu-czesc-dobry-den-zdravstvuite-geia-sas-lmsllmlllmckhmslmgll"

这篇关于Emacs lisp:将字符翻译成标准ASCII转录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 05:54