本文介绍了Apache FOP使用SimSun显示###的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在维护一个使用Apache FOP打印PDF文档的程序.关于中文字符以"####"出现的投诉有几起.我已经找到了有关此问题的现有线索,并在我这一方面进行了一些研究.

http://apache-fop.1065347.n5. nabble.com/Chinese-Fonts-td10789.html

我的系统上确实安装了uming.tff语言文件.与该线程中的人员不同,我仍然得到"####".

从现在开始,有没有人看到一种解决方法,可以让您使用Apache FOP在PDF文档中打印复杂字符?

瑞安

解决方案

在使用FOP 创建的PDF文件中,汉字必须采取三个步骤才能正确显示(对于所有未使用汉字的汉字也是如此)默认字体,更常见的是使用非默认字体)..

让我们使用这个简单的fo示例显示发生错误时FOP产生的警告:

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="one">
            <fo:region-body />
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="one">
        <fo:flow flow-name="xsl-region-body">
            <!-- a block of chinese text -->
            <fo:block>博洛尼亚大学中国学生的毕业论文</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

处理此输入时,FOP会给出与该警告类似的几种警告:

org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Helvetica".
...

FO文件中没有任何明确的字体系列指示,FOP默认使用Helvetica,这是 Base-14字体(这些字体随处可见,因此无需嵌入它们).

每种字体都支持一组字符,并为它们分配一个可见的字形.当字体不支持字符时,会产生上述警告,并且 PDF会显示#"而不是缺少的字形.

步骤1:在FO文件中设置字体系列

如果默认字体不支持文本的字符(或者我们只是想使用其他字体),则必须使用font-family属性来声明所需的字体./p>

font-family的值是继承的,因此,如果要在整个文档中使用相同的字体,可以在fo:page-sequence上设置属性.如果仅需要某些段落或单词的特殊字体,则可以在相关的fo:blockfo:inline上设置font-family.

因此,我们的输入变为(使用我拥有的字体作为示例):

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="one">
            <fo:region-body />
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="one">
        <fo:flow flow-name="xsl-region-body">
            <!-- a block of chinese text -->
            <fo:block font-family="SimSun">博洛尼亚大学中国学生的毕业论文</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

但是,除了旧警告以外,现在我们还会收到新警告!

org.apache.fop.events.LoggingEventListener processEvent
WARNING: Font "SimSun,normal,400" not found. Substituting with "any,normal,400".
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Times-Roman".
...

FOP不知道如何将"SimSun"映射到字体文件,因此它默认为不支持中文字符的通用Base-14字体(Times-Roman),并且 PDF仍显示#" .

步骤2:在FOP的配置文件中配置字体映射

在FOP的文件夹中,文件conf/fop.xconf是示例配置;我们可以直接对其进行编辑或进行复制.

配置文件是XML文件,我们必须添加字体映射/fop/renderers/renderer[@mime = 'application/pdf']/fonts/内的a>(每种可能的输出mime类型都有一个renderer部分,因此请检查您是否在正确的映射中插入了映射):

<?xml version="1.0"?>
<fop version="1.0">
  ...
  <renderers>
    <renderer mime="application/pdf">
      ...
      <fonts>

        <!-- specific font mapping -->
        <font kerning="yes" embed-url="/Users/furini/Library/Fonts/SimSun.ttf" embedding-mode="subset">
          <font-triplet name="SimSun" style="normal" weight="normal"/>
        </font>

        <!-- "bulk" font mapping -->
        <directory>/Users/furini/Library/Fonts</directory>

      </fonts>
      ...
    </renderer>
    ...
  </renderers>
</fop>

如果我们有一个完整的文件集,其中包含所需字体的特定版本(常规,斜体,粗体,浅色,粗体斜体等),则可以将每个文件映射到精确的字体三元组,从而生成非常复杂的PDF

在频谱的另一端,我们可以将所有三元组映射到同一个字体文件(如果有的话):即使在FO文件中标记了部分内容,在输出中所有文本也将显示相同.如斜体或粗体.

请注意,我们不需要注册所有可能的字体三元组.如果缺少,FOP将使用为相似"字体注册的字体(例如,如果我们不映射三元组"SimSun,italic,400",FOP将使用映射为"SimSun,normal,400"的字体) ,警告我们有关字体替换的信息.

我们还没有完成,因为没有下一步和最后一步,我们处理输入文件时就没有任何改变.

步骤3:告诉FOP使用配置文件

如果从命令行调用FOP,则使用-c选项指向我们的配置文件,例如:

$ fop -c /path/to/our/fop.xconf input.fo input.pdf

我们可以使用Java代码(另请参见 FOP网站):

fopFactory.setUserConfig(new File("/path/to/our/fop.xconf"));

现在,最后,PDF应该正确使用所需的字体并按预期显示.

如果相反,FOP突然终止,并显示如下错误:

org.apache.fop.cli.Main startFOP
SEVERE: Exception org.apache.fop.apps.FOPException: Failed to resolve font with embed-url '/Users/furini/Library/Fonts/doesNotExist.ttf'

这意味着FOP找不到字体文件,因此需要再次检查字体配置;典型原因是

  • 字体网址中的错字
  • 没有足够的权限访问字体文件

I am maintaining a program which uses the Apache FOP for printing PDF documents. There have been a couple complaints about the Chinese characters coming up as "####". I have found an existing thread out there about this problem and done some research on my side.

http://apache-fop.1065347.n5.nabble.com/Chinese-Fonts-td10789.html

I do have the uming.tff language files installed on my system. Unlike the person in this thread, I still getting the "####".

From this point forward, has anyone seen a work around that would allow you to print complex characters in a PDF document using Apache FOP?

Ryan

解决方案

Three steps must be taken for chinese characters to correctly show in a PDF file created with FOP (this is also true for all characters not available in the default font, and more generally to use a non-default font).

Let us use this simple fo example to show the warnings produced by FOP when something is wrong:

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="one">
            <fo:region-body />
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="one">
        <fo:flow flow-name="xsl-region-body">
            <!-- a block of chinese text -->
            <fo:block>博洛尼亚大学中国学生的毕业论文</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Processing this input, FOP gives several warnings similar to this one:

org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Helvetica".
...

Without any explicit font-family indication in the FO file, FOP defaults to using Helvetica, which is one of the Base-14 fonts (fonts that are available everywhere, so there is no need to embed them).

Each font supports a set of characters, assigning a visible glyphs to them; when a font does not support a character, the above warning is produced, and the PDF shows "#" instead of the missing glyph.

Step 1: set font-family in the FO file

If the default font doesn't support the characters of our text (or we simply want to use a different font), we must use the font-family property to state the desired one.

The value of font-family is inherited, so if we want to use the same font for the whole document we can set the property on the fo:page-sequence; if we need a special font just for some paragraphs or words, we can set font-family on the relevant fo:block or fo:inline.

So, our input becomes (using a font I have as example):

<?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="one">
            <fo:region-body />
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="one">
        <fo:flow flow-name="xsl-region-body">
            <!-- a block of chinese text -->
            <fo:block font-family="SimSun">博洛尼亚大学中国学生的毕业论文</fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

But now we get a new warning, in addition to the old ones!

org.apache.fop.events.LoggingEventListener processEvent
WARNING: Font "SimSun,normal,400" not found. Substituting with "any,normal,400".
org.apache.fop.events.LoggingEventListener processEvent
WARNING: Glyph "?" (0x535a) not available in font "Times-Roman".
...

FOP doesn't know how to map "SimSun" to a font file, so it defaults to a generic Base-14 font (Times-Roman) which does not support our chinese characters, and the PDF still shows "#".

Step 2: configure font mapping in FOP's configuration file

Inside FOP's folder, the file conf/fop.xconf is an example configuration; we can directly edit it or make a copy to start from.

The configuration file is an XML file, and we have to add the font mappings inside /fop/renderers/renderer[@mime = 'application/pdf']/fonts/ (there is a renderer section for each possible output mime type, so check you are inserting your mapping in the right one):

<?xml version="1.0"?>
<fop version="1.0">
  ...
  <renderers>
    <renderer mime="application/pdf">
      ...
      <fonts>

        <!-- specific font mapping -->
        <font kerning="yes" embed-url="/Users/furini/Library/Fonts/SimSun.ttf" embedding-mode="subset">
          <font-triplet name="SimSun" style="normal" weight="normal"/>
        </font>

        <!-- "bulk" font mapping -->
        <directory>/Users/furini/Library/Fonts</directory>

      </fonts>
      ...
    </renderer>
    ...
  </renderers>
</fop>

  • each font element points to a font file
  • each font-triplet entry identifies a combination of font-family + font-style (normal, italic, ...) + font-weight (normal, bold, ...) mapped to the font file in the parent font element
  • using folder elements it is also possible to configure automatically all the font files inside the indicated folders (but this takes some time if the folders contain a lot of fonts)

If we have a complete file set with specific versions of the desired font (normal, italic, bold, light, bold italic, ...) we can map each file to the precise font triplet, thus producing a very sophisticated PDF.

On the opposite end of the spectrum we can map all the triplet to the same font file, if it's all we have available: in the output all text will appear the same, even if in the FO file parts of it were marked as italic or bold.

Note that we don't need to register all possible font triplets; if one is missing, FOP will use the font registered for a "similar" one (for example, if we don't map the triplet "SimSun,italic,400" FOP will use the font mapped to "SimSun,normal,400", warning us about the font substitution).

We are not done yet, as without the next and last step nothing changes when we process our input file.

Step 3: tell FOP to use the configuration file

If we are calling FOP from the command line, we use the -c option to point to our configuration file, for example:

$ fop -c /path/to/our/fop.xconf input.fo input.pdf

From java code we can use (see also FOP's site):

fopFactory.setUserConfig(new File("/path/to/our/fop.xconf"));

Now, at last, the PDF should correctly use the desired fonts and appear as expected.

If instead FOP terminates abruptly with an error like this:

org.apache.fop.cli.Main startFOP
SEVERE: Exception org.apache.fop.apps.FOPException: Failed to resolve font with embed-url '/Users/furini/Library/Fonts/doesNotExist.ttf'

it means that FOP could not find the font file, and the font configuration needs to be checked again; typical causes are

  • a typo in the font url
  • insufficient privileges to access the font file

这篇关于Apache FOP使用SimSun显示###的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-28 13:26
查看更多