问题描述
我在C ++中遇到一些非Ascii字符的问题。我有一个文件containg非ascii字符,我正在C + +通过文件处理阅读。读取文件(比如1.txt)后,我将数据存储到字符串流中,并将其写入另一个文件(例如2.txt)。
假设1.txt包含:
ação
b $ b
在2.txt我应该得到相同的ouyput,但非ASCII字符打印为它的十六进制值在2.txt。
相当肯定的是C ++正在处理Ascii字符as Ascii。
请帮助如何在2.txt中正确打印这些字符
编辑:
首先是整个流程的代码:
1.从DB读取脚本一个值并存储在11.txt中
2.CPP代码(a.cpp)读取11.txt并写入f.txt
数据存在于正在读取的DB中:Instalação
文件11 .txt包含:Instalaçã£o
文件F.txt包含:InstalaÃ
屏幕上a.cpp的输出:Instalação
p>
a.cpp
#include< iterator>
#include< iostream>
#include< algorithm>
#include< sstream>
#include< fstream>
#include< iomanip>
using namespace std;
int main()
{
ifstream myReadFile;
ofstream f2;
myReadFile.open(11.txt);
f2.open(f2.txt);
string output;
if(myReadFile.is_open())
{
while(!myReadFile.eof())
{
myReadFile>输出;
// cout<<< output;
cout<<\\\
;
std :: stringstream tempDummyLineItem;
tempDummyLineItem<< output;
cout<< tempDummyLineItem.str();
f2<< tempDummyLineItem.str();
}
}
myReadFile.close();
return 0;
}
Locale说:
LANG = en_US.UTF-8
LC_CTYPE =en_US.UTF-8
LC_NUMERIC =en_US.UTF-8
LC_TIME =en_US.UTF-8
LC_COLLATE =en_US.UTF-8
LC_MONETARY =en_US.UTF-8
LC_MESSAGES =en_US.UTF-8
LC_PAPER =en_US.UTF-8
LC_NAME =en_US.UTF-8
LC_ADDRESS =en_US.UTF-8
LC_TELEPHONE =en_US.UTF-8
LC_MEASUREMENT =en_US.UTF-8
LC_IDENTIFICATION =en_US.UTF-8
LC_ALL =
解决方案听起来像一个utf8问题。由于您没有使用c ++ 11 标记您的问题这里是一个关于unicode和c ++流的excelent文章。
从更新的代码,让我解释发生了什么。您创建一个文件流以读取您的文件。在内部,文件流只识别 chars
,直到你告诉它。在大多数机器上, char
只能保存8位数据,但是文件中的字符使用多于8位。为了能够正确地读取你的文件,你需要知道它是如何编码的。最常见的编码是UTF-8,每个字符使用1到4个 chars
。
知道你的编码,你可以使用wifstream(对于UTF-16)或 imbue()
一个语言环境的其他编码。
更新:
如果您的文件是ISO-88591(从您上面的评论),请尝试此。
wifstream myReadFile;
myReadFile.imbue(std :: locale(en_US.iso88591));
myReadFile.open(11.txt);
I am facing some issues with non-Ascii chars in C++. I have one file containg non-ascii chars which I am reading in C++ via file Handling. After reading the file(say 1.txt) I am storing the data into string stream and writing it into another file(say 2.txt).
Assume 1.txt contains:
ação
In 2.txt I should get same ouyput but non-Ascii chars are printed as their Hex value in 2.txt.
Also, I am quite sure that C++ is handling Ascii chars as Ascii only.
Please Help on how to print these chars correctly in 2.txt
EDIT:
Firstly Psuedo-Code for Whole Process:
1.Shell script to Read from DB one Value and stores in 11.txt
2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt
Data Present in DB which is being read: Instalação
File 11.txt contains: Instalação
File F.txt Contains: Instalação
Ouput of a.cpp on screen: Instalação
a.cpp
#include <iterator>
#include <iostream>
#include <algorithm>
#include <sstream>
#include<fstream>
#include <iomanip>
using namespace std;
int main()
{
ifstream myReadFile;
ofstream f2;
myReadFile.open("11.txt");
f2.open("f2.txt");
string output;
if (myReadFile.is_open())
{
while (!myReadFile.eof())
{
myReadFile >> output;
//cout<<output;
cout<<"\n";
std::stringstream tempDummyLineItem;
tempDummyLineItem <<output;
cout<<tempDummyLineItem.str();
f2<<tempDummyLineItem.str();
}
}
myReadFile.close();
return 0;
}
Locale says this:
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
解决方案 Sounds to me like a utf8 issue. Since you didn't tag your question with c++11 Here Is an excelent article on unicode and c++ streams.
From your updated code, let me explain what is happening. You create a file stream to read your file. Internally the file stream only recognizes chars
, until you tell it otherwise. A char
, on most machines, can only hold 8 bits of data, but the characters in your file are using more than 8 bits. To be able to read your file correctly, you NEED to know how it is encoded. The most common encoding is UTF-8, which uses between 1 and 4 chars
for each character.
Once you know your encoding, you can either use wifstream (for UTF-16) or imbue()
a locale for other encodings.
Update:If your file is ISO-88591 (from your comment above), try this.
wifstream myReadFile;
myReadFile.imbue(std::locale("en_US.iso88591"));
myReadFile.open("11.txt");
这篇关于在C ++中处理非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!