同时生成PDF无法获得捷克字符

本文介绍了同时生成PDF无法获得捷克字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

添加字符时，如C或C，而生成PDF我有一个问题。我主要使用段落插入一些静态文本到我的PDF格式的报告。下面是一些示例code我用：

I have a problem when adding characters such as "Č" or "Ć" while generating a PDF. I'm mostly using paragraphs for inserting some static text into my PDF report. Here is some sample code I used:

var document = new Document();
document.Open();
Paragraph p1 = new Paragraph("Testing of letters Č,Ć,Š,Ž,Đ", new Font(Font.FontFamily.HELVETICA, 10));
document.Add(p1);

在生成的PDF文件时，我得到的输出，看起来是这样的：信测试,, S，Z，DJ

The output I get when the PDF file is generated, looks like this: "Testing of letters ,,Š,Ž,Đ"

由于某种原因iTextSharp的似乎并没有认识到这些字母，如C和C。

For some reason iTextSharp doesn't seem to recognize these letters such as "Č" and "Ć".

推荐答案

问题：

首先，，你似乎没有谈论西里尔字母，而是使用拉丁字母中部和东部欧洲语言。看看并的来理解我的意思。 [注：使其谈到捷克字符，而不是西里尔我已经更新的问题]

First of all, you don't seem to be talking about Cyrillic characters, but about central and eastern European languages that use Latin script. Take a look at the difference between code page 1250 and code page 1251 to understand what I mean. [NOTE: I have updated the question so that it talks about Czech characters instead of Cyrillic.]

二观察。你写code包含特殊字符：

Second observation. You are writing code that contains special characters:

"Testing of letters Č,Ć,Š,Ž,Đ"

这是一个不好的做法。 code文件存储为纯文本格式，可以使用不同的编码保存。一个偶然的交换机从编码（例如：将其上传到使用不同编码的版本系统），会严重损害你的文件的内容

That is a bad practice. Code files are stored as plain text and can be saved using different encodings. An accidental switch from encoding (for instance: by uploading it to a versioning system that uses a different encoding), can seriously damage the content of your file.

您应该写code不包含特殊字符，但使用不同的符号。例如：

You should write code that doesn't contain special characters, but that use a different notations. For instance:

"Testing of letters \u010c,\u0106,\u0160,\u017d,\u0110"

这也将确保使用编译器，它需要一个不同的编码编译code的含量并没有得到改变。

This will also make sure that the content doesn't get altered when compiling the code using a compiler that expects a different encoding.

您的第三的错误是，你认为黑体是知道如何绘制这些字形的字体。这是一个错误的假设。您应该使用的字体文件，如ARIAL.TTF（或挑选知道如何绘制这些字形任何其他字体）。

Your third mistake is that you assume that Helvetica is a font that knows how to draw these glyphs. That is a false assumption. You should use a font file such as Arial.ttf (or pick any other font that knows how to draw those glyphs).

您的第四的错误是，你不嵌入字体。假设你使用你有你的本地机器上，那就是能够利用特殊的字形字体，那么你就可以读取你本地机器上的文字。然而，谁收到您的文件，但没有你自己的本地计算机上使用的字体有些人可能无法正确读取该文件。

Your fourth mistake is that you do not embed the font. Suppose that you use a font you have on your local machine and that is able to draw the special glyphs, then you will be able to read the text on your local machine. However, somebody who receives your file, but doesn't have the font you used on his local machine may not be able to read the document correctly.

您的第五的错误是，使用的字体，当你没有定义编码（这关系到你的第二个错误，但它是不同的）。

Your fifth mistake is that you didn't define an encoding when using the font (this is related to your second mistake, but it's different).

解决方案：

我已经写了一个小例子称为的结果在下面的PDF：的

I have written a small example called CzechExample that results in the following PDF: czech.pdf

我已经加入相同的文字两次，但使用不同的编码方式：

I have added the same text twice, but using a different encoding:

public static final String FONT = "resources/fonts/FreeSans.ttf";
public void createPdf(String dest) throws IOException, DocumentException {
    Document document = new Document();
    PdfWriter.getInstance(document, new FileOutputStream(DEST));
    document.open();
    Font f1 = FontFactory.getFont(FONT, "Cp1250", true);
    Paragraph p1 = new Paragraph("Testing of letters \u010c,\u0106,\u0160,\u017d,\u0110", f1);
    document.add(p1);
    Font f2 = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, true);
    Paragraph p2 = new Paragraph("Testing of letters \u010c,\u0106,\u0160,\u017d,\u0110", f2);
    document.add(p2);
    document.close();
}

要避免你的第三个错误，我用的字体FreeSans.ttf而不是Helvetica字体。只要它支持您要使用的字符，你可以选择任何其他字体。为避免你的第四错误，我已经在嵌入参数设置为真正。

To avoid your third mistake, I used the font FreeSans.ttf instead of Helvetica. You can choose any other font as long as it supports the characters you want to use. To avoid your fourth mistake, I have set the embedded parameter to true.

至于你的第五个错误，我介绍了两种不同的方法。

As for your fifth mistake, I introduced two different approaches.

在第一种情况下，我告诉iText的使用code 1250页

In the first case, I told iText to use code page 1250.

Font f1 = FontFactory.getFont(FONT, "Cp1250", true);

这将嵌入字体作为的简单的字体的成PDF格式，这意味着在你的字符串的每个字符将被重新presented使用的单个字节的。这种方法的优点是简单;缺点是，你不应该开始混合code页面。例如：这不会西里尔字形工作

This will embed the font as a simple font into the PDF, meaning that each character in your String will be represented using a single byte. The advantage of this approach is simplicity; the disadvantage is that you shouldn't start mixing code pages. For instance: this won't work for Cyrillic glyphs.

在第二种情况下，我告诉iText的使用统一code为横写：

In the second case, I told iText to use Unicode for horizontal writing:

Font f2 = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, true);

这将嵌入字体作为的复合字体的成PDF格式，这意味着在你的字符串将被重新presented每个字符使用的多个字节的。这种方法的优点是，它是在新的PDF标准推荐的方法（如PDF / A，PDF / UA），并且可以与拉美混合西里尔文，日本，中国等......缺点是你创造更多的字节，但其效果是通过以下事实内容流的COM $ p $无论如何pssed限定。

This will embed the font as a composite font into the PDF, meaning that each character in your String will be represented using more than one byte. The advantage of this approach is that it is the recommended approach in the newer PDF standards (e.g. PDF/A, PDF/UA), and that you can mix Cyrillic with Latin, Chinese with Japanese, etc... The disadvantage is that you create more bytes, but that effect is limited by the fact that content streams are compressed anyway.

当我DECOM preSS在样本PDF文本内容流，我看到下面的PDF语法：

When I decompress the content stream for the text in the sample PDF, I see the following PDF syntax:

正如我所解释的，单字节用于存储第一行的文本。双字节用于存储第二行的文本

As I explained, single bytes are used to store the text of the first line. Double bytes are used to store the text of the second line.

您可能会感到惊讶，这些人物看起来OK在外面（在Adobe Reader中的文字看的时候），但不跟你看到在里面（在第二截屏看时）什么相对应，但是这它是如何工作的。

You may be surprised that these characters look OK on the outside (when looking at the text in Adobe Reader), but don't correspond with what you see on the inside (when looking at the second screen shot), but that's how it works.

结论：

很多人都认为创建PDF是微不足道的，而创建PDF工具应该是一种商品。在现实中，它并不总是那么简单; - ）

Many people think that creating PDF is trivial, and that tools for creating PDF should be a commodity. In reality, it's not always that simple ;-)

这篇关于同时生成PDF无法获得捷克字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！