问题描述
我正在维护一个使用 Apache FOP 打印 PDF 文档的程序.有一些关于中文字符出现####"的抱怨.我找到了一个关于这个问题的现有线程,并在我这边做了一些研究.
I am maintaining a program which uses the Apache FOP for printing PDF documents. There have been a couple complaints about the Chinese characters coming up as "####". I have found an existing thread out there about this problem and done some research on my side.
http://apache-fop.1065347.n5.nabble.com/Chinese-Fonts-td10789.html
我的系统上确实安装了 uming.tff 语言文件.与此线程中的人不同,我仍然收到####".
I do have the uming.tff language files installed on my system. Unlike the person in this thread, I am still getting the "####".
从现在开始,有没有人看到允许您使用 Apache FOP 在 PDF 文档中打印复杂字符的变通方法?
From this point forward, has anyone seen a work around that would allow you to print complex characters in a PDF document using Apache FOP?
推荐答案
要在使用 FOP 创建的 PDF 文件中正确显示中文字符必须采取三个步骤 (对于所有在 FOP 中不可用的字符也是如此)默认字体,更普遍的是使用非默认字体).
Three steps must be taken for chinese characters to correctly show in a PDF file created with FOP (this is also true for all characters not available in the default font, and more generally to use a non-default font).
让我们使用这个简单的 fo 示例来显示出现问题时 FOP 产生的警告:
Let us use this simple fo example to show the warnings produced by FOP when something is wrong:
<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="one">
<fo:region-body />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="one">
<fo:flow flow-name="xsl-region-body">
<!-- a block of chinese text -->
<fo:block>博洛尼亚大学中国学生的毕业论文</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
处理此输入时,FOP 会给出几个与此类似的警告:
Processing this input, FOP gives several warnings similar to this one:
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Helvetica".
...
在 FO 文件中没有任何明确的字体系列指示,FOP 默认使用 Helvetica,它是 Base-14 字体(字体随处可用,因此无需嵌入).
Without any explicit font-family indication in the FO file, FOP defaults to using Helvetica, which is one of the Base-14 fonts (fonts that are available everywhere, so there is no need to embed them).
每种字体都支持一组字符,并为它们分配一个可见的字形;当字体不支持字符时,会产生上述警告,并且PDF 显示#"而不是缺少的字形.
Each font supports a set of characters, assigning a visible glyphs to them; when a font does not support a character, the above warning is produced, and the PDF shows "#" instead of the missing glyph.
如果默认字体不支持我们文本的字符(或者我们只是想使用不同的字体),我们必须使用 font-family
属性说明想要的.
If the default font doesn't support the characters of our text (or we simply want to use a different font), we must use the font-family
property to state the desired one.
font-family
的值是继承的,所以如果我们想为整个文档使用相同的字体,我们可以在 fo:page-sequence
上设置属性>;如果我们只为某些段落或单词需要特殊字体,我们可以在相关的 fo:block
或 fo:inline
上设置 font-family
.
The value of font-family
is inherited, so if we want to use the same font for the whole document we can set the property on the fo:page-sequence
; if we need a special font just for some paragraphs or words, we can set font-family
on the relevant fo:block
or fo:inline
.
所以,我们的输入变成了(以我的字体为例):
So, our input becomes (using a font I have as example):
<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="one">
<fo:region-body />
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="one">
<fo:flow flow-name="xsl-region-body">
<!-- a block of chinese text -->
<fo:block font-family="SimSun">博洛尼亚大学中国学生的毕业论文</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
但是现在我们收到了一个新的警告,除了旧的警告!
But now we get a new warning, in addition to the old ones!
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Font "SimSun,normal,400" not found. Substituting with "any,normal,400".
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Times-Roman".
...
FOP 不知道如何将SimSun"映射到字体文件,因此它默认为不支持我们的汉字的通用 Base-14 字体(Times-Roman),并且PDF 仍显示#".
FOP doesn't know how to map "SimSun" to a font file, so it defaults to a generic Base-14 font (Times-Roman) which does not support our chinese characters, and the PDF still shows "#".
在FOP的文件夹中,文件conf/fop.xconf
是一个示例配置;我们可以直接编辑它或复制它开始.
Inside FOP's folder, the file conf/fop.xconf
is an example configuration; we can directly edit it or make a copy to start from.
配置文件是一个XML文件,我们要添加字体映射 在 /fop/renderers/renderer[@mime = 'application/pdf']/fonts/
(对于每种可能的输出 mime 类型都有一个 renderer
部分,所以检查您是否将映射插入正确的映射):
The configuration file is an XML file, and we have to add the font mappings inside /fop/renderers/renderer[@mime = 'application/pdf']/fonts/
(there is a renderer
section for each possible output mime type, so check you are inserting your mapping in the right one):
<?xml version="1.0"?>
<fop version="1.0">
...
<renderers>
<renderer mime="application/pdf">
...
<fonts>
<!-- specific font mapping -->
<font kerning="yes" embed-url="/Users/furini/Library/Fonts/SimSun.ttf" embedding-mode="subset">
<font-triplet name="SimSun" style="normal" weight="normal"/>
</font>
<!-- "bulk" font mapping -->
<directory>/Users/furini/Library/Fonts</directory>
</fonts>
...
</renderer>
...
</renderers>
</fop>
- 每个
font
元素指向一个字体文件 - 每个
font-triplet
条目标识了 font-family +font-style
(普通、斜体、...)+font-weight
(normal, bold, ...) 映射到父font
元素中的字体文件 - 使用
folder
元素还可以自动配置指定文件夹内的所有字体文件(但如果文件夹包含大量字体,这需要一些时间) - each
font
element points to a font file - each
font-triplet
entry identifies a combination offont-family
+font-style
(normal, italic, ...) +font-weight
(normal, bold, ...) mapped to the font file in the parentfont
element - using
folder
elements it is also possible to configure automatically all the font files inside the indicated folders (but this takes some time if the folders contain a lot of fonts)
如果我们有一个完整的文件集,其中包含所需字体的特定版本(正常、斜体、粗体、浅色、粗斜体等),我们可以将每个文件映射到精确的三元组字体,从而生成非常复杂的 PDF.
If we have a complete file set with specific versions of the desired font (normal, italic, bold, light, bold italic, ...) we can map each file to the precise font triplet, thus producing a very sophisticated PDF.
在光谱的另一端,我们可以将所有三元组映射到同一个字体文件,如果我们有可用的全部:在输出中所有文本将显示相同,即使在 FO 文件中部分被标记斜体或粗体.
On the opposite end of the spectrum we can map all the triplet to the same font file, if it's all we have available: in the output all text will appear the same, even if in the FO file parts of it were marked as italic or bold.
请注意,我们不需要注册所有可能的字体三元组;如果缺少一个,FOP 将使用为类似"字体注册的字体(例如,如果我们不映射三元组SimSun,italic,400",FOP 将使用映射到SimSun,normal,400"的字体,警告我们关于字体替换).
Note that we don't need to register all possible font triplets; if one is missing, FOP will use the font registered for a "similar" one (for example, if we don't map the triplet "SimSun,italic,400" FOP will use the font mapped to "SimSun,normal,400", warning us about the font substitution).
我们还没有完成,因为没有下一步和最后一步,我们处理输入文件时没有任何变化.
We are not done yet, as without the next and last step nothing changes when we process our input file.
如果我们从命令行调用 FOP,我们使用 -c
选项来指向我们的配置文件,例如:
If we are calling FOP from the command line, we use the -c
option to point to our configuration file, for example:
$ fop -c /path/to/our/fop.xconf input.fo input.pdf
我们可以使用 Java 代码(另请参见 FOP 的网站):
From java code we can use (see also FOP's site):
fopFactory.setUserConfig(new File("/path/to/our/fop.xconf"));
现在,最终,PDF 应该正确使用所需的字体并按预期显示.
Now, at last, the PDF should correctly use the desired fonts and appear as expected.
如果 FOP 突然终止并出现如下错误:
If instead FOP terminates abruptly with an error like this:
org.apache.fop.cli.Main startFOP
SEVERE: Exception org.apache.fop.apps.FOPException: Failed to resolve font with embed-url '/Users/furini/Library/Fonts/doesNotExist.ttf'
表示FOP找不到字体文件,需要重新检查字体配置;典型的原因是
it means that FOP could not find the font file, and the font configuration needs to be checked again; typical causes are
- 字体网址中的拼写错误
- 权限不足,无法访问字体文件
这篇关于Apache FOP 使用 SimSun 显示 ###的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!