本文介绍了所有世界语言的翻译表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以告诉我,在哪里可以找到所有世界语言字母的翻译表,包括俄罗斯,希腊语,泰语等?我需要一个函数来从任何语言的文本创建精美的url。而且,因为我们对日语一无所知,所以我正在尝试这种方式。谢谢您的答复

can anyone tell me, where can I find translation table for all world language letter, including russia, greek, thai etc? I need a function to create fancy url from text in any language. And, because we know nothing about for example japanese, I am trying this way. Thanks for you replies

推荐答案

音译通常是很简单的,请参见。直截了当,您的问题的答案是您要查找的表不存在。

Transliteration in general is non-trivial, see the Unicode Transliteration Guidelines. The answer to your question, bluntly, is that the table you're looking for doesn't exist.

也就是说,有一些解决方法,例如肖恩·伯克(Sean M. Burke)的 Perl模块(以及的端口)。但正如他指出的那样,您不会因为这种转换而对泰文或日文进行音译,例如泰语或日语。

That said, there are a few work-arounds available, like Sean M. Burke's Unidecode Perl module (and ports to Ruby Python). But as he points out, you're not going to transliteration for, say, Thai or Japanese that's usefully readable from such conversion.

使用Python端口查看以下测试会话:

Take a look at the following test session using the Python port:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from unidecode import unidecode

hello = u"""Hello world! English
Salut le monde! French
Saluton Mondo! Esperanto
Sveika, pasaule! Latvian
Tere, maailm! Estonian
Merhaba dünya! Turkish
Olá mundo! Portuguese
안녕, 세상! Korean
你好,世界! Chinese
こんにちは 世界! Japanese
ሠላም ዓለም! Amharic
哈佬世界! Cantonese
Привет, мир! Russian
Καλημέρα κόσμε! Greek
สวัสดีราคาถูก! Thai"""

lines = hello.splitlines()
samples = []

for line in lines:
  language, text = line.split()[-1], ' '.join(line.split()[:-1])
  samples.append( (language, text) )

for language, text in samples:
  print language.upper()
  print text
  print unidecode(text)
  print

输出如下:

ENGLISH

你好,世界!

你好,世界!

ENGLISH
Hello world!
Hello world!

法语

Salut le monde!

Salut le monde!

FRENCH
Salut le monde!
Salut le monde!

ESPERANTO

Saluton Mondo!

Saluton Mondo!

ESPERANTO
Saluton Mondo!
Saluton Mondo!

拉脱维亚语

Sveika,paasaule!

Sveika,pasaule!

LATVIAN
Sveika, pasaule!
Sveika, pasaule!

爱沙尼亚语

Tere,爱伦!

Tere,爱伦!

ESTONIAN
Tere, maailm!
Tere, maailm!

土耳其语

Merhabadünya!

Merhaba dunya!

TURKISH
Merhaba dünya!
Merhaba dunya!

葡萄牙语

Olámundo!

Ola mundo!

PORTUGUESE
Olá mundo!
Ola mundo!

韩文

안녕,세상!

安宁,芝麻!

KOREAN
안녕, 세상!
annyeong, sesang!

中文

你好,世界!

倪浩,史杰!

CHINESE
你好,世界!
Ni Hao ,Shi Jie !

日语

こんにちは世界!

知识世界!

JAPANESE
こんにちは 世界!
konnitiha Shi Jie !

AMHARIC

ዓለም!

szalaameʻaalame!

AMHARIC
ሠላም ዓለም!
szalaame `aalame!

CANTONESE

哈佬世界!

哈劳世界!

CANTONESE
哈佬世界!
Ha Lao Shi Jie !

俄语

Привет,мир!

Priviet,mir!

RUSSIAN
Привет, мир!
Priviet, mir!

希腊

Καλημέρακόσμε!

Kalemera kosme!

GREEK
Καλημέρα κόσμε!
Kalemera kosme!

泰国

คราคาถูก!

swasdiiraakhaathuuk!

THAI
สวัสดีราคาถูก!
swasdiiraakhaathuuk!

对于拉丁语最初的语言,它非常有用:它会去除重音符号。除此之外,事情变得很快。

For languages that are Latin-ish in the first place, it's quite useful: it strips accent marks. Outside of those, things get dicey fast.

如果比较中文和日语示例,您会发现序列世界被音译施杰。错了-日语的音译(或更好的是阅读)应该是 seikai 。俄语和希腊语还不错。但是,阿姆哈拉语和泰语太糟糕了-我猜他们甚至不会流利地使用这些语言的人也难以辨认。

If you compare the Chinese and Japanese examples, you'll see that the sequence 世界 is transliterated Shi Jie in both. That's wrong -- the "transliteration" (or better, "reading") of the Japanese should be seikai. The Russian and Greek are not too bad. But Amharic and Thai are abysmal--I would guess that they're not even legible to someone who's fluent in those languages.

这里的一般问题是音译是无法定义的,除非还考虑了特定于语言的信息,甚至确定语言也不是简单的:您的程序是否应该知道世界是日语还是中文?

The general problem here is that transliteration is not something that can be defined unless language-specific information is also taken into account, and even determining language is non-trivial: how is your program supposed to know if 世界 is in Japanese or Chinese?

比试图强行破解更好的策略音译到您的应用程序中首先是要弄清楚如何正确支持Unicode。如果必须使用非拉丁文字的全ASCII表示形式,请使用URL编码。

A better policy than trying to force hackish transliteration into your application is to figure out how to support Unicode properly in the first place. If you have to have an all-ASCII representation of non-Latin-script text, use URL encoding.

这篇关于所有世界语言的翻译表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 02:44