问题描述
我有一个大文件要解析.以前,它用space
或comma
分开,我用sscanf(string, "%lf %lf ", &aa, &bb);
将数据导入程序.
I have a huge file to parse. Previously, it was separated by either space
or comma
and I used sscanf(string, "%lf %lf ", &aa, &bb);
to get the data into my program.
但是现在数据格式已更改为"122635.670399999","209705.752799999"
,同时带有逗号和引号.而且我不知道该如何处理.实际上,我以前的代码是在网上找到的,我很难找到解决此类问题的合适文档.如果您可以向我推荐一些,那就太好了.谢谢.
But now the data format is changed to "122635.670399999","209705.752799999"
, with both comma and quotation marks. And I have no idea how to deal with it. Actually, my previous code was found online and I had a really hard time finding a proper document for this kind of problems. It will be great if you can recommend some to me. Thanks.
推荐答案
不是读取字符串,而是删除字符串中的逗号和引号,最后将数据转换为数字,我可能会创建一个语言环境对象将逗号和引号归类为空格,使流具有该语言环境,并在没有其他条件的情况下读取数字.
Rather than read a string, then remove the commas and quotes from the strings, and finally convert the data to numbers, I'd probably create a locale object that classifies commas and quotes as white space, imbue the stream with that locale, and read the numbers without further adieu.
// here's our ctype facet:
class my_ctype : public std::ctype<char> {
public:
mask const *get_table() {
static std::vector<std::ctype<char>::mask>
table(classic_table(), classic_table()+table_size);
// tell it to classify quotes and commas as "space":
table['"'] = (mask)space;
table[','] = (mask)space;
return &table[0];
}
my_ctype(size_t refs=0) : std::ctype<char>(get_table(), false, refs) { }
};
使用它,我们可以读取如下数据:
Using that, we can read the data something like this:
int main() {
// Test input from question:
std::string input("\"122635.670399999\",\"209705.752799999\"");
// Open the "file" of the input (from the string, for test purposes).
std::istringstream infile(input);
// Tell the stream to use the locale we defined above:
infile.imbue(std::locale(std::locale(), new my_ctype));
// Read the numbers into a vector of doubles:
std:vector<double> numbers{std::istream_iterator<double>(infile),
std::istream_iterator<double>()};
// Print out the sum of the numbers to show we read them:
std::cout << std::accumulate(numbers.begin(), numbers.end(), 0.0);
}
请注意,一旦使用ctype构面为流添加了语言环境,我们就可以读取数字,就好像根本没有逗号和引号一样.由于ctype构面将它们归类为空格,因此它们在充当其他内容之间的分隔符时被完全忽略.
Note that once we've imbued the stream with a locale using our ctype facet, we can just read numbers as if the commas and quotes didn't exist at all. Since the ctype facet classifies them as white-space, they're completely ignored beyond acting as separators between other stuff.
我主要是为了指出这一点,以便在此之后的任何处理中都没有魔力.如果您愿意使用istream_iterator
代替(例如)double value; infile >> value;
,没有什么特别的.您可以使用通常读取以空格分隔的数字的任何方式来读取数字-因为就流而言,这正是您所拥有的 .
I'm pointing this out primarily to make clear that there's no magic in any of the processing after that. There's nothing special about using istream_iterator
instead of (for example) double value; infile >> value;
if you prefer to do that. You can read the numbers any of the ways you'd normally read numbers that were separated by white space -- because as far as the stream cares, that's exactly what you have.
这篇关于如何在C ++中解析引号和逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!