问题描述
我正在为iOS应用程序编写服务器API.作为初始化过程的一部分,该应用程序应通过API调用将电话界面语言发送到服务器.
I am writing a server API for an iOS application. As a part of the initialization process, the app should send the phone interface language to server via an API call.
问题在于,Apple在其 IETF BCP 47语言标识符 ="https://developer.apple.com/library/mac/documentation/cocoa/reference/foundation/classes/NSLocale_Class/Reference/Reference.html#jumpTo_18" rel ="nofollow"> NSLocale preferredLanguages
函数 .
The problem is that Apple uses something called IETF BCP 47 language identifier in its NSLocale preferredLanguages
function.
返回的值具有不同的长度(例如[aa, ab, ace, ach, ada, ady, ae, af, afa, afh, agq, ...]
,我发现很少有解析器可以将该代码转换为正确的语言标识符.
The returned values have different lengths (e.g. [aa, ab, ace, ach, ada, ady, ae, af, afa, afh, agq, ...]
, and I found very few parsers that can convert this code to a proper language identifier.
我想使用更常见的 ISO-639-2三字母语言标识符无处不在,在许多语言中都有许多解析器,并且具有标准的三字母语言表示.
I would like to use the more common ISO-639-2 three-letters language identifier, which is ubiquitous, has many parsers in many languages, and has a standard, 3-letter representation of languages.
如何将IETF BCP 47语言标识符转换为ISO-639-2三字母语言标识符(最好使用Python)?
推荐答案
BCP 47标识符以2个字母的ISO 639-1 或 3个字母639-2、639-3或639-5开头语言代码;请参见 RFC 5646语法部分:
BCP 47 identifiers start with a 2 letter ISO 639-1 or 3 letter 639-2, 639-3 or 639-5 language code; see the RFC 5646 Syntax section:
Language-Tag = langtag ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
我不希望Apple使用privateuse
或grandfathered
格式,因此可以假定您正在使用的是ISO 639-1,ISO 639-2,ISO 639-3或ISO 639-5语言这里的代码.只需将2个字母的ISO-639-1代码映射到3个字母的ISO 639- *代码即可.
I don't expect Apple to use the privateuse
or grandfathered
forms, so you can assume that you are looking at ISO 639-1, ISO 639-2, ISO 639-3 or ISO 639-5 language codes here. Simply map the 2-letter ISO-639-1 codes to 3-letter ISO 639-* codes.
您可以为此使用 pycountry
软件包:
You can use the pycountry
package for this:
import pycountry
lang = pycountry.languages.get(alpha2=two_letter_code)
three_letter_code = lang.terminology
演示:
>>> import pycountry
>>> lang = pycountry.languages.get(alpha2='aa')
>>> lang.terminology
u'aar'
其中术语形式是首选的3个字母的代码;还有一个 libliography 形式,仅22个条目有所不同.请参见 ISO 639-2 B和T代码 .该软件包不包含来自ISO 639-5的条目.该列表在某些地方与639-2重叠并发生冲突,我认为Apple根本不使用此类代码.
where the terminology form is the preferred 3-letter code; there is also a bibliography form which differs only for 22 entries. See ISO 639-2 B and T codes. The package doesn't include entries from ISO 639-5 however; that list overlaps and conflicts with 639-2 in places and I don't think Apple uses such codes at all.
这篇关于怎么把IETF BCP 47语言标识符转换成ISO-639-2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!