I'm working on parsing files with Shift-JIS encoded strings within the binary data. My current code is this:

public static string DecodeShiftJISString(this byte[] data, int index, int length)
    byte[] utf8Bytes = Encoding.Convert(Encoding.GetEncoding(932), Encoding.UTF8, data);
    return Encoding.UTF8.GetString(utf8Bytes);


It works fine and I am able to get usable strings from this method, although when I display strings with Latin characters into my WinForms application, I see that the characters are wider than normal.



I'm not sure if this is an issue with my encoding logic, or the way I'm supposed to display the strings (I just pass them directly into my controls). Any help would be appreciated!


这些不是普通的ASCII字符,它们是U + FF01范围内的全角变体" 全角感叹号.它们用于在设置拉丁字符和CJK字符混合时排队格式化.

These aren't normal ASCII characters, they're ‘fullwidth variants’ in the range U+FF01 fullwidth exclamation mark upwards. They're for lining up formatting when setting a mixture of Latin and CJK characters.


Unicode would prefer weird characters like this, which are just semantically-identical stylistic variants of existing characters, not to exist. But it has to include them to round-trip to legacy encodings like Shift-JIS. For this reason they are called Compatibility characters.

您可以使用NFKC等"K"格式的Unicode规范化将兼容字符转换为其基本变体.在Win32中,您可以使用 NormalizeString() a>.

You can convert compatibility characters to their basic variants by using Unicode normalisation with a ‘K’ format such as NFKC. In Win32 you can do this using NormalizeString().

