中的Unicode字符串处理

中的Unicode字符串处理

本文介绍了C ++中的Unicode字符串处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,我可以读取unicode文件,但现在我看到我将整个文件放在一个字符串中,现在我无法将其打破,然后是单词。我很迷茫。我之前的帖子,但由于问题不同,现在我发布新的问题。目标: - 我想过滤文件中的一些单词(例如用双引号括起来) - 我已经读取了unicode(UTF16文件)并且它的单个字符串 - 我需要逐行打破它然后使用cstok打破它用语言



平台Windows,Visual Studio 2010,Unicode:UTF16如果你有不同的建议,我愿意改变代码,如果你能改变它也会很棒粘贴示例代码以了解。



粘贴以下代码:



Ok i could read the unicode file but now i see that i get the entire file in one string and now i am unable to break it in line and then words. I am very confused. i had previous post but since the problem is different now i am posting new ques. objective: - i want to filter some words from the file (e.g. enclosed in double quotes) - i have read the unicode (UTF16 file )and its got in single string - i need to break it line by line and then using cstok break it in words

Platform Windows , Visual studio 2010 , Unicode: UTF16 If you have different suggestions, i am open to change the code ,also it would be great if you could paste the sample code to understand.

Pasting the code below:

#include <codecvt>
#include <locale>

wifstream fin("profiles.txt", ios_base::binary);  //open a file
wofstream fout("out.txt",ios_base::binary);  // this dumps the parsing ouput

fin.imbue(std::locale(fin.getloc(),new std::codecvt_utf16<wchar_t, 0x10ffff,       std::consume_header>));
fout.imbue(std::locale(fin.getloc(),new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));

wstring line;
getline(fin,line);  //-----------------here i get the entire file in wstring line

// Need suggestions on below code on how to handle

while (!fin.eof())
{
    // read an entire line into memory
    // wchar_t buf[MAX_CHARS_PER_LINE];

    //fin.getline(buf, MAX_CHARS_PER_LINE);

    // parse the line into blank-delimited tokens
    int n = 0; // a for-loop index

    // array to store memory addresses of the tokens in buf
    const wchar_t* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0

    // parse the line
    token[0] = wcstok(buf, DELIMITER); // first token

    if (token[0]) // zero if line is blank
    {

        for (n = 0; n < MAX_TOKENS_PER_LINE; n++)   // setting n=0 as we want to ignore the first token
        {
            token[n] = wcstok(0, DELIMITER); // subsequent tokens

            if (!token[n]) break; // no more tokens

            std::wstring str2 =token[n];
         }
    }
}

推荐答案

in.imbue(std::locale(fin.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff,
        std::codecvt_mode(std::little_endian|std::consume_header)>);





after修复这个代码的其余部分按预期工作。

感谢@Richard,@ nv3,@ pablo的回复。非常感谢。



after fixing this the rest of the code worked as expected.
Thanks @Richard , @nv3 ,@pablo for response. much appreciated.


这篇关于C ++中的Unicode字符串处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 02:43