C ++中的Unicode字符串处理

本文介绍了C ++中的Unicode字符串处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好的，我可以读取unicode文件，但现在我看到我将整个文件放在一个字符串中，现在我无法将其打破，然后是单词。我很迷茫。我之前的帖子，但由于问题不同，现在我发布新的问题。目标： - 我想过滤文件中的一些单词（例如用双引号括起来） - 我已经读取了unicode（UTF16文件）并且它的单个字符串 - 我需要逐行打破它然后使用cstok打破它用语言

平台Windows，Visual Studio 2010，Unicode：UTF16如果你有不同的建议，我愿意改变代码，如果你能改变它也会很棒粘贴示例代码以了解。

粘贴以下代码：

Ok i could read the unicode file but now i see that i get the entire file in one string and now i am unable to break it in line and then words. I am very confused. i had previous post but since the problem is different now i am posting new ques. objective: - i want to filter some words from the file (e.g. enclosed in double quotes) - i have read the unicode (UTF16 file )and its got in single string - i need to break it line by line and then using cstok break it in words

Platform Windows , Visual studio 2010 , Unicode: UTF16 If you have different suggestions, i am open to change the code ,also it would be great if you could paste the sample code to understand.

Pasting the code below:

#include <codecvt>
#include <locale>

wifstream fin("profiles.txt", ios_base::binary);  //open a file
wofstream fout("out.txt",ios_base::binary);  // this dumps the parsing ouput

fin.imbue(std::locale(fin.getloc(),new std::codecvt_utf16<wchar_t, 0x10ffff,       std::consume_header>));
fout.imbue(std::locale(fin.getloc(),new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));

wstring line;
getline(fin,line);  //-----------------here i get the entire file in wstring line

// Need suggestions on below code on how to handle

while (!fin.eof())
{
    // read an entire line into memory
    // wchar_t buf[MAX_CHARS_PER_LINE];

    //fin.getline(buf, MAX_CHARS_PER_LINE);

    // parse the line into blank-delimited tokens
    int n = 0; // a for-loop index

    // array to store memory addresses of the tokens in buf
    const wchar_t* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0

    // parse the line
    token[0] = wcstok(buf, DELIMITER); // first token

    if (token[0]) // zero if line is blank
    {

        for (n = 0; n < MAX_TOKENS_PER_LINE; n++)   // setting n=0 as we want to ignore the first token
        {
            token[n] = wcstok(0, DELIMITER); // subsequent tokens

            if (!token[n]) break; // no more tokens

            std::wstring str2 =token[n];
         }
    }
}

中的Unicode字符串处理

问题描述

推荐答案