47语言标识符转换成ISO

47语言标识符转换成ISO

本文介绍了怎么把IETF BCP 47语言标识符转换成ISO-639-2?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为iOS应用程序编写服务器API.作为初始化过程的一部分,该应用程序应通过API调用将电话界面语言发送到服务器.

I am writing a server API for an iOS application. As a part of the initialization process, the app should send the phone interface language to server via an API call.

问题在于,Apple在其 IETF BCP 47语言标识符 ="https://developer.apple.com/library/mac/documentation/cocoa/reference/foundation/classes/NSLocale_Class/Reference/Reference.html#jumpTo_18" rel ="nofollow"> NSLocale preferredLanguages函数 .

The problem is that Apple uses something called IETF BCP 47 language identifier in its NSLocale preferredLanguages function.

返回的值具有不同的长度(例如[aa, ab, ace, ach, ada, ady, ae, af, afa, afh, agq, ...],我发现很少有解析器可以将该代码转换为正确的语言标识符.

The returned values have different lengths (e.g. [aa, ab, ace, ach, ada, ady, ae, af, afa, afh, agq, ...], and I found very few parsers that can convert this code to a proper language identifier.

我想使用更常见的 ISO-639-2三字母语言标识符无处不在,在许多语言中都有许多解析器,并且具有标准的三字母语言表示.

I would like to use the more common ISO-639-2 three-letters language identifier, which is ubiquitous, has many parsers in many languages, and has a standard, 3-letter representation of languages.

如何将IETF BCP 47语言标识符转换为ISO-639-2三字母语言标识符(最好使用Python)?

推荐答案

BCP 47标识符以2个字母的ISO 639-1 3个字母639-2、639-3或639-5开头语言代码;请参见 RFC 5646语法部分:

BCP 47 identifiers start with a 2 letter ISO 639-1 or 3 letter 639-2, 639-3 or 639-5 language code; see the RFC 5646 Syntax section:

Language-Tag  = langtag             ; normal language tags
              / privateuse          ; private use tag
              / grandfathered       ; grandfathered tags

langtag       = language
                ["-" script]
                ["-" region]
                *("-" variant)
                *("-" extension)
                ["-" privateuse]

language      = 2*3ALPHA            ; shortest ISO 639 code
                ["-" extlang]       ; sometimes followed by
                                    ; extended language subtags
              / 4ALPHA              ; or reserved for future use
              / 5*8ALPHA            ; or registered language subtag

我不希望Apple使用privateusegrandfathered格式,因此可以假定您正在使用的是ISO 639-1,ISO 639-2,ISO 639-3或ISO 639-5语言这里的代码.只需将2个字母的ISO-639-1代码映射到3个字母的ISO 639- *代码即可.

I don't expect Apple to use the privateuse or grandfathered forms, so you can assume that you are looking at ISO 639-1, ISO 639-2, ISO 639-3 or ISO 639-5 language codes here. Simply map the 2-letter ISO-639-1 codes to 3-letter ISO 639-* codes.

您可以为此使用 pycountry软件包:

You can use the pycountry package for this:

import pycountry

lang = pycountry.languages.get(alpha2=two_letter_code)
three_letter_code = lang.terminology

演示:

>>> import pycountry
>>> lang = pycountry.languages.get(alpha2='aa')
>>> lang.terminology
u'aar'

其中术语形式是首选的3个字母的代码;还有一个 libliography 形式,仅22个条目有所不同.请参见 ISO 639-2 B和T代码 .该软件包不包含来自ISO 639-5的条目.该列表在某些地方与639-2重叠并发生冲突,我认为Apple根本不使用此类代码.

where the terminology form is the preferred 3-letter code; there is also a bibliography form which differs only for 22 entries. See ISO 639-2 B and T codes. The package doesn't include entries from ISO 639-5 however; that list overlaps and conflicts with 639-2 in places and I don't think Apple uses such codes at all.

这篇关于怎么把IETF BCP 47语言标识符转换成ISO-639-2?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-24 11:59