我正在使用Tidy清理大量HTML。我正在使用的功能是:
std::string cleanHTML (std::string htmlcontent)
{
char* outputstr;
TidyBuffer output ={0};
uint buflen =0;
TidyBuffer errbuf;
int rc = -1;
Bool ok;
TidyDoc tdoc = tidyCreate(); // Initialize "document"
tidyBufInit( &errbuf );
ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes ); // Convert to XHTML
if ( ok )
rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics
if ( rc >= 0 )
rc = tidyParseString( tdoc, htmlcontent.c_str() ); // Parse the input
if ( rc >= 0 )
rc = tidySaveBuffer (tdoc,&output ); // Tidy it up!
uint yy= output.size;
outputstr = (char*)malloc(yy+10);
uint xx=yy+10;
rc = tidySaveString (tdoc,outputstr,&xx);
std::string cleanedhtml (outputstr);
tidyBufFree(&output);
tidyBufFree(&errbuf);
tidyRelease(tdoc);
return cleanedhtml;
}
该程序似乎在使用gdb的某个调用(我认为该调用没有明显区别)上对tidyBufFree(&output)进行隔离。此功能似乎也有内存泄漏。
有人可以帮忙吗?
编辑:
我已按照建议使用Valgrind,输出如下(有人可以解释一下这是什么意思吗?)。
==7860== Process terminating with default action of signal 11 (SIGSEGV)
==7860== Access not within mapped region at address 0x0
==7860== at 0x428B00: tidyBufFree (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== by 0x405EC6: cleanHTML(std::string) (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== by 0x4048A3: get_tvseries(std::string) (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== by 0x403DE2: main (in /home/sergerold/qt5_episode_analyser/a.out)
==7860== If you believe this happened as a result of a stack
==7860== overflow in your program's main thread (unlikely but
==7860== possible), you can try to increase the size of the
==7860== main thread stack using the --main-stacksize= flag.
==7860== The main thread stack size used in this run was 8388608.
==7860==
==7860== HEAP SUMMARY:
==7860== in use at exit: 2,285,594 bytes in 3,638 blocks
==7860== total heap usage: 102,543 allocs, 98,905 frees, 137,801,931 bytes allocated
==7860==
==7860== LEAK SUMMARY:
==7860== definitely lost: 0 bytes in 0 blocks
==7860== indirectly lost: 0 bytes in 0 blocks
==7860== possibly lost: 1,303,686 bytes in 114 blocks
==7860== still reachable: 981,908 bytes in 3,524 blocks
==7860== suppressed: 0 bytes in 0 blocks
==7860== Rerun with --leak-check=full to see details of leaked memory
==7860==
==7860== For counts of detected and suppressed errors, rerun with: -v
==7860== Use --track-origins=yes to see where uninitialised values come from
==7860== ERROR SUMMARY: 113 errors from 17 contexts (suppressed: 0 from 0)
Segmentation fault
解决了:
当&output为空并导致对空指针的取消引用时,分段错误是由tidyBufFree(&output)引起的。
最佳答案
您的代码看起来很像this example,但是没有什么重要的区别。
请注意在示例中作者未调用tidyBufInit( &errbuf );
,这可能是您的内存泄漏。为了安全起见,请使用例如valgrind的内存调试工具。至于段错误-似乎您所做的自由输出操作是正确的(至少根据示例而言),因此我的猜测是堆栈损坏可能是导致此问题的原因。再次,valgrind可能会帮助您找到它。
关于c++ - TidyBufFree上的HTML Tidy segfault,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/20610238/