本文介绍了每种语言最常用的编码是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个纯文本阅读器应用程序。有时,应用程序无法自动确定文件的编码,因此用户需要从编码列表中选择一个编码。如果此列表包含所有支持的编码,则它将太长。我想提供一个简化的列表,只包含每种语言的最常见的编码。



这是我知道的一些关系:




    繁体中文:Big5
    简体中文:GB18030
    日语:Shift-JIS,EUC- JP
    俄语:KOI8-R


如果您知道任何其他语言最常见的编码方式,请告诉我。

解决方案

FWIW,这里是按默认字符编码分组的Windows XP语言环境:




    Big5 :zh_HK,zh_MO,zh_TW
    GBK GB2312):zh_CN,zh_SG
    Windows-31J (≈Shift_JIS):ja_JP
    windows-874 (≈TIS-620,ISO-8859-11):th_TH
    windows-949 (≈EUC-KR):ko_KR
    windows-1250 :bs_BA,cs_CZ,hr_BA,hr_HR,hu_HU,pl_PL,ro_RO,sk_SK,sl_SI,sq_AL,s r_BA,sr_SP
    windows-1251 :az_AZ,be_BY,bg_BG,kk_KZ,ky_KG,mk_MK,mn_MN,ru_RU,sr_BA,sr_SP,tt_RU,uk_UA,uz_UZ
    Windows-1252 (≈ISO-8859-1):af_ZA,arn_CL,ca_ES,cy_GB,da_DK,de_AT,de_CH,de_DE,de_LI,de_LU, en_CA,en_CB,en_GB,en_IE,en_JM,en_NZ,en_HE,en_TT,en_US,en_ZA,en_ZW,es_AR,es_BO,es_CL,es_CO,es_CR,es_DO,es_EC,es_ES,es_GT,es_HN,es_MX,es_NI, es_PA,es_PE,es_PR,es_PY,es_SV,es_UY,es_VE,eu_ES,fi_FI,fil_PH,fo_FO,fr_BE,fr_CA,fr_CH,fr_FR,fr_LU,fr_MC,fy_NL,ga_IE,gl_ES,id_ID,is_IS,it_CH,it_IT,iu_CA, iv_IV,lb_LU,moh_CA,ms_BN,ms_MY,nb_NO,nl_BE,nl_NL,nn_NO,ns_ZA,pt_BR,pt_PT,qu_BO,qu_EC,qu_PE,rm_CH,se_FI,se_NO,se_SE,sv_FI,sv_SE,sw_KE,tn_ZA,xh_ZA,zu_ZA
    windows-1253 :el_GR
    windows-1254 (≈ISO-8859-9 ):az_AZ,tr_TR,uz_UZ
    windows-1255 :he_IL
    windows-1256 :ar_AE,ar_BH,ar_DZ,ar_EG,ar_IQ,ar_JO,ar_KW,ar_LB,ar_LY,ar_MA,ar_OM,ar_QA,ar_SA,ar_SY,ar_TN,ar_YE,fa_IR,ps_AF,ur_PK
    windows-1257 :et_EE,lt_LT,lv_LV
    windows-1258 :vi_VN


和:


    (89.2%)
    (5.0%)
    (1.6%)
    (0.9%)
    (0.8%)
    GB2312(0.7%)
    EUC-KR (0.4%)
    EUC-JP(0.3%)
    GBK(0.3%)
    ISO -8859-2(0.2%)
    Windows-1250(0.2%)
    ISO-8859-15(0.1%)
    Windows-1256(0.1%)
    ISO-8859-9(0.1%)
    Big5(0.1%)
    Windows-1254(0.1%)
    Windows-874(0.1%)


I am developing a plain-text reader application. Sometimes app can't auto determine the encoding of a file, so user needs select an encoding from a list of encodings. If this list contains all supported encodings, it will be too long. I want to provide a simplified list, only contains most common encodings of each language.

This is some relationship I am known:

    Traditional Chinese: Big5Simplified Chinese: GB18030Japanese: Shift-JIS, EUC-JPRussian: KOI8-R

If you know any other language's most common encoding, please tell me.

解决方案

FWIW, here are the Windows XP locales grouped by default character encoding:

    Big5: zh_HK, zh_MO, zh_TWGBK (≈GB2312): zh_CN, zh_SGWindows-31J (≈Shift_JIS): ja_JPwindows-874 (≈TIS-620, ISO-8859-11): th_THwindows-949 (≈EUC-KR): ko_KRwindows-1250: bs_BA, cs_CZ, hr_BA, hr_HR, hu_HU, pl_PL, ro_RO, sk_SK, sl_SI, sq_AL, sr_BA, sr_SPwindows-1251: az_AZ, be_BY, bg_BG, kk_KZ, ky_KG, mk_MK, mn_MN, ru_RU, sr_BA, sr_SP, tt_RU, uk_UA, uz_UZwindows-1252 (≈ISO-8859-1): af_ZA, arn_CL, ca_ES, cy_GB, da_DK, de_AT, de_CH, de_DE, de_LI, de_LU, en_AU, en_BZ, en_CA, en_CB, en_GB, en_IE, en_JM, en_NZ, en_PH, en_TT, en_US, en_ZA, en_ZW, es_AR, es_BO, es_CL, es_CO, es_CR, es_DO, es_EC, es_ES, es_GT, es_HN, es_MX, es_NI, es_PA, es_PE, es_PR, es_PY, es_SV, es_UY, es_VE, eu_ES, fi_FI, fil_PH, fo_FO, fr_BE, fr_CA, fr_CH, fr_FR, fr_LU, fr_MC, fy_NL, ga_IE, gl_ES, id_ID, is_IS, it_CH, it_IT, iu_CA, iv_IV, lb_LU, moh_CA, ms_BN, ms_MY, nb_NO, nl_BE, nl_NL, nn_NO, ns_ZA, pt_BR, pt_PT, qu_BO, qu_EC, qu_PE, rm_CH, se_FI, se_NO, se_SE, sv_FI, sv_SE, sw_KE, tn_ZA, xh_ZA, zu_ZAwindows-1253: el_GRwindows-1254 (≈ISO-8859-9): az_AZ, tr_TR, uz_UZwindows-1255: he_ILwindows-1256: ar_AE, ar_BH, ar_DZ, ar_EG, ar_IQ, ar_JO, ar_KW, ar_LB, ar_LY, ar_MA, ar_OM, ar_QA, ar_SA, ar_SY, ar_TN, ar_YE, fa_IR, ps_AF, ur_PKwindows-1257: et_EE, lt_LT, lv_LVwindows-1258: vi_VN

and the most common encodings overall on the Web:

    UTF-8 (89.2%)ISO-8859-1 (5.0%)Windows-1251 (1.6%)Shift JIS (0.9%)Windows-1252 (0.8%)GB2312 (0.7%)EUC-KR (0.4%)EUC-JP (0.3%)GBK (0.3%)ISO-8859-2 (0.2%)Windows-1250 (0.2%)ISO-8859-15 (0.1%)Windows-1256 (0.1%)ISO-8859-9 (0.1%)Big5 (0.1%)Windows-1254 (0.1%)Windows-874 (0.1%)

这篇关于每种语言最常用的编码是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-19 14:18