我无法在pdf文件中检测到空白页。我已经在互联网上搜索了它,但是找不到一个好的解决方案。



我试过了

if(xobjects==null || textcontent==null || size <20 bytes )
  then "blank"
else
 not blank

但是最长的时间它返回错误的答案。我用过 Itextsharp

代码在下面...
我正在使用 Itextsharp Librabry

对于xobjects
PdfDictionary xobjects = resourceDic.GetAsDict(PdfName.XOBJECT);
//here resourceDic is PdfDictionary type
//I know that if Xobjects is null then page is blank. But sometimes blank page gives xobjects which is not null.

对于内容流
 RandomAccessFileOrArray f = reader.SafeFile;
 //here reader = new PdfReader(filename);

 byte[] contentBytes = reader.GetPageContent(pageNum, f);
 //I have measured the size of contentbytes but sometimes it gives more than 20 bytes for   blank page

用于文本内容
String extractedText = PdfTextExtractor.GetTextFromPage(reader, pageNum, new LocationTextExtractionStrategy());
  // sometimes blank page give a text more than 20 char length .

最佳答案

发现空页面的一种非常简单的方法是:使用Ghostscript命令行调用bbox设备。

Ghostscript的bbox会计算该最小矩形“边界框”的坐标,该矩形包围页面中将要渲染像素的所有点:

gs \
  -o /dev/null \
  -sDEVICE=bbox \
   input.pdf

在Windows上:
gswin32c.exe ^
  -o nul ^
  -sDEVICE=bbox ^
   input.pdf

结果:
GPL Ghostscript 9.05 (2012-02-08)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 6.
Page 1
%%BoundingBox: 27 281 548 804
%%HiResBoundingBox: 27.000000 281.000000 547.332031 804.000000
Page 2
%%BoundingBox: 0 0 0 0
%%HiResBoundingBox: 0.000000 0.000000 0.000000 0.000000
Page 3
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
Page 4
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
Page 5
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000
Page 6
%%BoundingBox: 27 302 568 814
%%HiResBoundingBox: 27.949219 302.000000 567.332031 814.000000

如您所见,输入文档的第2页为空。

10-08 03:33