问题描述
在Windows中,CP437字符集中的Unicode字符ö
(带小写字母的拉丁小写字母o)的值是148
.
In Windows, the value of the Unicode character ö
(Latin small letter o with diaeresis) in the CP437 character set is 148
.
在Linux中,UTF-8编码中的ö
的字节值为:
In Linux, the byte value for ö
in the UTF-8 encoding is:
-61(Hi Byte)
-74(Lo Byte)
(unsigned value = 46787)
我的问题是,如何在Linux上的C ++中将148
从CP437转换为UTF-8?
My Question is, how can I convert from 148
from CP437 to UTF-8 in C++ on Linux?
我的问题的详细信息在这里:
The detailed info for my problem lies here:
Linux中带有扩展字符(128-255)的open()函数返回-1错误
临时解决方案: C ++ 11支持使用codecvt_utf8
推荐答案
在Windows上,您可以使用Win32 MultiByteToWideChar()
函数将数据从CP437转换为UTF-16,然后使用 WideCharToMultiByte()
函数将数据从UTF-16转换为UTF-8.
On Windows, you can use the Win32 MultiByteToWideChar()
function to convert data from CP437 to UTF-16, and then use the WideCharToMultiByte()
function to convert data from UTF-16 to UTF-8.
在Linux上,您可以使用Unicode转换库,例如 libiconv 或 ICU (也适用于Windows).
On Linux, you can use a Unicode conversion library, like libiconv or ICU (which are available for Windows, too).
在C ++ 11和更高版本中,您可以使用 std::wstring_convert
至:
In C++11 and later, you can use std::wstring_convert
to:
-
从CP437转换为UTF-16或UTF-32/UCS-4(也就是说,如果可以为CP437获取/制作一个
codecvt
.)
然后将其从UTF-16或UTF-32/UCS-4转换为UTF-8.
then, convert from UTF-16 or UTF-32/UCS-4 to UTF-8.
您不能使用 codecvt_utf8
将CP437转换为直接使用UTF-8.它仅支持以下之间的转换:
You can't use codecvt_utf8
to convert from CP437 to UTF-8 directly. It only supports conversions between:
-
UTF-8和UCS-2(不是UTF-16!)
UTF-8 and UCS-2 (not UTF-16!)
UTF-8和UTF-32/UCS-4.
UTF-8 and UTF-32/UCS-4.
您必须使用 codecvt_utf8_utf16
在UTF-8之间进行转换和UTF-16.
You have to use codecvt_utf8_utf16
for conversions between UTF-8 and UTF-16.
或者,您可以使用 mbrtoc16()
来转换CP437到使用CP437语言环境的UTF-16,然后使用 c16rtomb()
使用UTF-8语言环境将UTF-16转换为UTF-8(如果您的STL库实现了 DR488 ,否则c16rtomb()
仅支持UCS-2,不支持UTF-16!).
Or, you can use mbrtoc16()
to convert CP437 to UTF-16 using a CP437 locale, and then use c16rtomb()
to convert UTF-16 to UTF-8 using a UTF-8 locale (if your STL library implements a fix for DR488, otherwise c16rtomb()
only supports UCS-2 and not UTF-16!).
否则,只需为256个可能的CP437字节创建自己的CP437到UTF8查找表,然后手动进行转换,一次转换一个字节即可.
Otherwise, just create your own CP437-to-UTF8 lookup table for the 256 possible CP437 bytes, and then do the conversion manually, one byte at a time.
这篇关于如何将文本从CP437编码转换为UTF8编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!