PdfBox编码符号货币欧元

PdfBox编码符号货币欧元

本文介绍了PdfBox编码符号货币欧元的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Apache PDFBox库创建了一个PDF文档。我的问题是在页面上绘制字符串时编码欧元货币符号,因为基本字体Helvetica不提供此字符。如何将输出þÿ¬转换为符号€?

I created a PDF document with the Apache PDFBox library. My problem is to encode the euro currency symbol when drawing a string on the page, because the base font Helvetica does not provide this character. How I can convert the output "þÿ ¬" to the symbol "€"?.

推荐答案

不幸的是,PDFBox的字符串编码远非完美(版本1.8.x)。不幸的是,它在编码通用PDF对象中的字符串时使用相同的例程,就像在内容流中编码字符串时一样,这是根本错误的。因此,不必使用 PDPageContentStream.drawString (使用错误的编码),您必须自己转换为正确的编码。

Unfortunately PDFBox's String encoding is far from perfect yet (version 1.8.x). Unfortunately it uses the same routines when encoding strings in generic PDF objects as when encoding strings in content streams which is fundamentally wrong. Thus, instead of using PDPageContentStream.drawString (which uses that wrong encodings), you have to translate to the correct encoding yourself.

例如而不是使用

    contentStream.beginText();
    contentStream.setTextMatrix(100, 0, 0, 100, 50, 100);
    contentStream.setFont(PDType1Font.HELVETICA, 2);
    contentStream.drawString("€");
    contentStream.endText();
    contentStream.close();

导致

你可以使用像

    contentStream.beginText();
    contentStream.setTextMatrix(100, 0, 0, 100, 50, 100);
    contentStream.setFont(PDType1Font.HELVETICA, 8);
    byte[] commands = "(x) Tj ".getBytes();
    commands[1] = (byte) 128;
    contentStream.appendRawCommands(commands);
    contentStream.endText();
    contentStream.close();

导致

如果你想知道我如何使用128作为€的字节代码,看一下PDF规范,附件D.2,拉丁字符集和编码,表示WinAnsiEncoding中€符号的八进制值200(十进制128)。

If you wonder how I got to use 128 as byte code for the €, have a look at the PDF specification ISO 32000-1, annex D.2, Latin Character Set and Encodings which indicates an octal value 200 (decimal 128) for the € symbol in WinAnsiEncoding.

PS :其他答案也提供了另一种方法,如果符号为€类似于:

PS: An alternative approach has meanwhile been presented by other answers which in case of the € symbol amounts to something like:

    contentStream.beginText();
    contentStream.setTextMatrix(100, 0, 0, 100, 50, 100);
    contentStream.setFont(PDType1Font.HELVETICA, 8);
    contentStream.drawString(String.valueOf(Character.toChars(EncodingManager.INSTANCE.getEncoding(COSName.WIN_ANSI_ENCODING).getCode("Euro"))));
    contentStream.endText();
    contentStream.close();

这确实也绘制了'€'符号。但即使这种方法看起来更干净(它不使用 byte 数组,它也不会手动构建实际的PDF流操作),它脏了以自己的方式:

This indeed also draws the '€' symbol. But even though this approach looks cleaner (it does not use byte arrays, it does not construct an actual PDF stream operation manually), it is dirty in its own way:

要使用破碎的方法,它实际上打破其正确的字符串参数方法来抵消方法中的错误。

To use a broken method, it actually breaks its string argument in just the right way to counteract the bug in the method.

因此,如果PDFBox人员决定修复损坏的PDFBox方法,这个看似干净的解决方法代码就会开始失败因为它会提供固定方法破坏的输入数据。

Thus, if the PDFBox people decided to fix the broken PDFBox method, this seemingly clean work-around code here would start to fail as it would then feed the fixed method broken input data.

不可否认,我怀疑他们会在2.0.0之前解决这个错误(并且在2.0.0中固定方法有一个不同的名字),但一个人永远不知道......

Admittedly, I doubt they will fix this bug before 2.0.0 (and in 2.0.0 the fixed method has a different name), but one never knows...

这篇关于PdfBox编码符号货币欧元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 05:56