PDF获取文本字符串的宽度和高度

PDF获取文本字符串的宽度和高度

本文介绍了如何使用CAM :: PDF获取文本字符串的宽度和高度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下方法读取PDF文件并获取页面的文本字符串:

I use the following to read a PDF file and get text strings of a page:

my $pdf = CAM::PDF->new($pdf_file);
my $pagetree = $pdf->getPageContentTree($page_no);

# Get all text strings of the page
# MyRenderer is a separate package which implements getTextBlocks and
# renderText methods

my @text = $pagetree->traverse('MyRenderer')->getTextBlocks;

现在,@text具有所有文本字符串,并且每个文本字符串的x,y开头.

Now, @text has all the text strings and start x,y of each text string.

如何获取每个字符串的宽度(可能还有高度)?

How can I get the width (and possibly the height) of each string?

MyRenderer程序包如下:

MyRenderer package is as follows:

package MyRenderer;
use base 'CAM::PDF::GS';
sub new {
    my ($pkg, @args) = @_;
    my $self = $pkg->SUPER::new(@args);
    $self->{refs}->{text} = [];
    return $self;
}

sub getTextBlocks {
    my ($self) = @_;
    return @{$self->{refs}->{text}};
}

sub renderText {
    my ($self, $string, $width) = @_;
    my ($x, $y) = $self->textToDevice(0,0);
    push @{$self->{refs}->{text}}, {
                                    str => $string,
                                    left => $x,
                                    bottom => $y,
                                    right =>$x + $width,
                                   };
    return;
}

更新1:有一个功能 getStringWidth($ fontmetrics,$ string) 在CAM :: PDF中.尽管该函数中有一个参数$ fontmetrics,但无论我传递给该参数什么,该函数都会为给定的字符串返回相同的值.

Update 1:There's a function getStringWidth($fontmetrics, $string) in CAM::PDF. Altough there's a parameter $fontmetrics in that function, irespective of what I pass to that parameter, the function returns the same value for a given string.

此外,我不确定返回值使用的计量单位.

Also, I am not sure of the unit of measure the returned value uses.

更新2:我将renderText函数更改为以下内容:

Update 2:I changed the renderText function to following:

sub renderText {
    my ($self, $string, $width) = @_;
    my ($x, $y) = $self->textToDevice(0,0);
    push @{$self->{refs}->{text}}, {
                                str => $string,
                                left => $x,
                                bottom => $y,
                                right =>$x + ($width * $self->{Tfs}),
                                font => $self->{Tf},
                                font_size => $self->{Tfs},
                               };
    return;
}

请注意,除了获取font和font_size之外,我还将$ width与font size相乘以获得字符串的实际宽度.

Note that in addition to getting font and font_size, I multiplied $width with font size to get the real width of the string.

现在,唯一缺少的是高度.

Now, only thing missing is the height.

推荐答案

getStringWidth()在很大程度上取决于您提供的字体规格.如果无法在该数据结构中找到字符宽度,则它会退回到以下代码:

getStringWidth() depends heavily on the font metrics you provide. If it can't find the character widths in that data structure, then it falls back to the following code:

   if ($width == 0)
   {
      # HACK!!!
      #warn "Using klugy width!\n";
      $width = 0.2 * length $string;
   }

这可能是您所看到的.当我写这篇文章时,我认为它比返回0更好.如果您的字体指标不错,并且您认为CAM :: PDF中存在错误,请随意发布更多详细信息,我来看看.

which may be what you're seeing. When I wrote that, I thought it was better than returning 0. If your font metrics seem good and you think there's a bug in CAM::PDF, feel free to post more details and I'll take a look.

这篇关于如何使用CAM :: PDF获取文本字符串的宽度和高度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 09:46