c++ - ifstream::unget()失败。是MS的实现错误还是我的代码错误？

昨天，我在相当简单的代码中发现了一个奇怪的错误，该错误基本上是从ifstream获取文本并将其标记化的。实际失败的代码会执行许多get()/peek()调用，以寻找 token “/*”。如果在流中找到 token ，则调用unget()，以便下一个方法看到以 token 开头的流。

有时，似乎仅取决于文件的长度，unget()调用会失败。内部调用pbackfail()，然后返回EOF。但是，在清除流状态后，我可以愉快地读取更多字符，因此它不完全是EOF。

深入研究之后，下面是完整的代码，可轻松重现该问题:

#include <iostream>
#include <fstream>
#include <string>

  //generate simplest string possible that triggers problem
void GenerateTestString( std::string& s, const size_t nSpacesToInsert )
{
  s.clear();
  for( size_t i = 0 ; i < nSpacesToInsert ; ++i )
    s += " ";
  s += "/*";
}

  //write string to file, then open same file again in ifs
bool WriteTestFileThenOpenIt( const char* sFile, const std::string& s, std::ifstream& ifs )
{
  {
    std::ofstream ofs( sFile );
    if( ( ofs << s ).fail() )
      return false;
  }
  ifs.open( sFile );
  return ifs.good();
}

  //find token, unget if found, report error, show extra data can be read even after error
bool Run( std::istream& ifs )
{
  bool bSuccess = true;

  for( ; ; )
  {
    int x = ifs.get();
    if( ifs.fail() )
      break;
    if( x == '/' )
    {
      x = ifs.peek();
      if( x == '*' )
      {
        ifs.unget();
        if( ifs.fail() )
        {
          std::cout << "oops.. unget() failed" << std::endl;
          bSuccess = false;
        }
        else
        {
          x = ifs.get();
        }
      }
    }
  }

  if( !bSuccess )
  {
    ifs.clear();
    std::string sNext;
    ifs >> sNext;
    if( !sNext.empty() )
      std::cout << "remaining data after unget: '" << sNext << "'" << std::endl;
  }

  return bSuccess;
}

int main()
{
  std::string s;
  const char* testFile = "tmp.txt";
  for( size_t i = 0 ; i < 12290 ; ++i )
  {
    GenerateTestString( s, i );

    std::ifstream ifs;
    if( !WriteTestFileThenOpenIt( testFile, s, ifs ) )
    {
      std::cout << "file I/O error, aborting..";
      break;
    }

    if( !Run( ifs ) )
      std::cout << "** failed for string length = " << s.length() << std::endl;
  }
  return 0;
}

当字符串长度接近2的典型倍数缓冲区大小4096、8192、12288时，程序将失败，这是输出:

oops.. unget() failed
remaining data after unget: '*'
** failed for string length = 4097
oops.. unget() failed
remaining data after unget: '*'
** failed for string length = 8193
oops.. unget() failed
remaining data after unget: '*'
** failed for string length = 12289

在Windows XP和7上进行测试时会发生这种情况，它们都是在调试/ Release模式下编译的，在动态/静态运行时，32位和64位系统/编译器中都带有VS2008(默认的编译器/链接器选项)。
在64位Debian系统上使用gcc4.4.5进行测试时，没有发现问题。

问题:

其他人可以测试一下吗？我非常希望能有一些积极的合作形式。

代码中是否存在任何不正确的内容，可能会导致该问题(而不是说这是否有意义)

或任何可能触发此行为的编译器标志？

所有解析器代码对于应用程序来说都是至关重要的，并且已经过大量测试，但是在测试代码中没有发现此问题。我应该提出极端的测试用例吗？如果是，我该怎么做？我怎么能预测这会导致问题？

如果这确实是一个错误，我应该在哪里最好报告它？

最佳答案

是的。标准流必须至少具有1个unget()位置。因此，在调用unget()之后，您只能安全地执行一个get()。当您调用peek()并且输入缓冲区为空时，underflow()发生，并且实现清除缓冲区并加载新的数据部分。请注意，peek()不会增加当前输入位置，因此它指向缓冲区的开头。当您尝试unget()时，实现会尝试减少当前输入位置，但是它已经在缓冲区的开头，因此失败。

当然，这取决于实现方式。如果流缓冲区包含多个字符，则有时可能会失败，有时不会失败。据我所知，Microsoft的实现仅在basic_filebuf中存储一个字符(除非您明确指定更大的缓冲区)，并且依赖<cstdio>内部缓冲(顺便说一句，这就是MVS iostream速度较慢的原因之一)。当unget()失败时，质量实现可能会再次从文件中加载缓冲区。但这不是必需的。

尝试修复您的代码，以便您不需要多个unget()位置。如果确实需要它，则用保证unget()不会失败的流包装该流(请查看Boost.Iostreams)。您发布的代码也是胡说八道。它尝试先unget()，然后再次get()。为什么？

关于c++ - ifstream::unget()失败。是MS的实现错误还是我的代码错误？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/3820396/