问题描述
我有一个希伯来语ANSI文本文件,我应该转换为Unicode希伯来语(文件)转换已完成,但我无法按预期获得所需的输出。请让我知道怎么做。
我尝试过:
I have a Hebrew ANSI text file i should convert to Unicode Hebrew ( file ) conversion is done but iam not able to get the desired output as expected. please let me know how to do it.
What I have tried:
//code page
int nlanguageCodePage = this->GetCodepage(lpszOldFileName);
while (fgets(chAnsiBuff, NMLANG_MaxNBuf, pFile) != NULL)
{
sUnicodeBuff = chAnsiBuff;
//CONVERTING TO UNICODE
nSize = MultiByteToWideChar(nlanguageCodePage, 0, sUnicodeBuff, -1, NULL, NULL);
MultiByteToWideChar(nlanguageCodePage, 0, sUnicodeBuff, -1, chUniocodeBuff, nSize);
// bom at starting
if (nBOM == 0) { arcOut.Write(&bom, 2); }
arcOut.WriteString(chUniocodeBuff);
nBOM++;
}
推荐答案
int nSize = MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, NULL, NULL);
LPWSTR sUnicodeBuf = new WCHAR[nSize];
MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, sUnicodeBuff, nSize);
// Use sUnicodeBuff here
delete [] sUniocodeBuff;
但是,当ANSI输入缓冲区具有固定大小时,它也可以用于输出缓冲区,因为Unicode字符串的字符数永远不会超过输入字符串中ANSI字符数:
However, when having a fixed size for the ANSI input buffer, it can be also used for the output buffer because the Unicode string will never have more wide characters than the number of ANSI characters in the input string:
WCHAR wUnicodeBuf[NMLANG_MaxNBuf];
while (fgets(chAnsiBuff, NMLANG_MaxNBuf, pFile) != NULL)
{
MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, wUnicodeBuff, NMLANG_MaxNBuf);
// bom at starting
if (nBOM == 0) { arcOut.Write(&bom, 2); }
arcOut.WriteString(wUnicodeBuff);
nBOM++;
}
这应该有效。如果结果不符合预期,请检查其他相关函数,如 arcOut.WriteString()
,如果BOM正确,并且您的输入文件是否真的使用代码编码page nlanguageCodePage
。
另一个可能的来源可能是 arcOut.WriteString()
将Unicode字符串转换回ANSI时调用。然后,您可以使用二进制写入:
That should work. If the result is not as expected, check your other involved functions like arcOut.WriteString()
, if the BOM is correct, and if your input file is really encoded with the code page nlanguageCodePage
.
Another possible source may be the arcOut.WriteString()
call when it converts the Unicode string back to ANSI. You may then use a binary write instead:
int len = MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, wUnicodeBuff, NMLANG_MaxNBuf);
// bom at starting
if (nBOM == 0) { arcOut.Write(&bom, 2); }
if (len > 0)
arcOut.Write(wUnicodeBuff, len * sizeof(WCHAR));
nBOM++;
[/ EDIT]
[/EDIT]
这篇关于关于希伯来ansi到unicode的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!