

本文介绍了如何使用CAM :: PDF获取文本字符串的宽度和高度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I use the following to read a PDF file and get text strings of a page:

my $pdf = CAM::PDF->new($pdf_file);
my $pagetree = $pdf->getPageContentTree($page_no);

# Get all text strings of the page
# MyRenderer is a separate package which implements getTextBlocks and
# renderText methods

my @text = $pagetree->traverse('MyRenderer')->getTextBlocks;


Now, @text has all the text strings and start x,y of each text string.


How can I get the width (and possibly the height) of each string?


MyRenderer package is as follows:

package MyRenderer;
use base 'CAM::PDF::GS';
sub new {
    my ($pkg, @args) = @_;
    my $self = $pkg->SUPER::new(@args);
    $self->{refs}->{text} = [];
    return $self;

sub getTextBlocks {
    my ($self) = @_;
    return @{$self->{refs}->{text}};

sub renderText {
    my ($self, $string, $width) = @_;
    my ($x, $y) = $self->textToDevice(0,0);
    push @{$self->{refs}->{text}}, {
                                    str => $string,
                                    left => $x,
                                    bottom => $y,
                                    right =>$x + $width,

更新1:有一个功能 getStringWidth($ fontmetrics,$ string) 在CAM :: PDF中.尽管该函数中有一个参数$ fontmetrics,但无论我传递给该参数什么,该函数都会为给定的字符串返回相同的值.

Update 1:There's a function getStringWidth($fontmetrics, $string) in CAM::PDF. Altough there's a parameter $fontmetrics in that function, irespective of what I pass to that parameter, the function returns the same value for a given string.


Also, I am not sure of the unit of measure the returned value uses.


Update 2:I changed the renderText function to following:

sub renderText {
    my ($self, $string, $width) = @_;
    my ($x, $y) = $self->textToDevice(0,0);
    push @{$self->{refs}->{text}}, {
                                str => $string,
                                left => $x,
                                bottom => $y,
                                right =>$x + ($width * $self->{Tfs}),
                                font => $self->{Tf},
                                font_size => $self->{Tfs},

请注意,除了获取font和font_size之外,我还将$ width与font size相乘以获得字符串的实际宽度.

Note that in addition to getting font and font_size, I multiplied $width with font size to get the real width of the string.


Now, only thing missing is the height.



getStringWidth() depends heavily on the font metrics you provide. If it can't find the character widths in that data structure, then it falls back to the following code:

   if ($width == 0)
      # HACK!!!
      #warn "Using klugy width!\n";
      $width = 0.2 * length $string;

这可能是您所看到的.当我写这篇文章时,我认为它比返回0更好.如果您的字体指标不错,并且您认为CAM :: PDF中存在错误,请随意发布更多详细信息,我来看看.

which may be what you're seeing. When I wrote that, I thought it was better than returning 0. If your font metrics seem good and you think there's a bug in CAM::PDF, feel free to post more details and I'll take a look.

这篇关于如何使用CAM :: PDF获取文本字符串的宽度和高度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 09:46