本文介绍了ReportLab:使用中文/ Unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR:
是否有某种方法告诉ReportLab使用特定的字体,如果缺少某些字符的字形,则会回退到另一个字体?或者,您知道包含所有欧洲语言,希伯来文,俄文,中文,日文和阿拉伯文字形的浓缩的TrueType字体吗? 我已经创建了报告与ReportLab,并遇到与呈现包含中文字符的字符串的问题。我一直使用的字体是DejaVu Sans Condensed,它不包含中文的字形(不过,它包含了西里尔文,希伯来文,阿拉伯文以及各种各样的用于欧洲语言支持的变音符号 - 这使得它非常灵活,我需要他们都时不时地)



然而,中文不支持字体,而且我还找不到支持所有语言的TrueType字体,并符合我们的平面设计要求。作为一个临时的解决方法,我使得中国客户的报告使用完全不同的字体,只包含英文和中文字形,希望其他语言的字符不会出现在字符串中。然而,这是因为显而易见的原因,笨重和打破了平面设计,因为它不是DejaVu Sans,整个外观和感觉都被设计出来。

所以问题是,你将如何处理在一个文档中支持多种语言的需求,并且维护每种语言的指定字体的使用。由于有时字符串中包含多种语言,所以这样做会变得更加复杂,因此确定为每个字符串使用哪一种字体不是一种选择。



是有一些告诉ReportLab使用特定字体的方法,如果某些字符的字形缺失,则回退到另一个字体?我在文档中发现了一些含糊的提示,应该是可以的,但是我可能会理解错误。另外,你知道一个浓缩的TrueType字体,它包含字形所有的欧洲语言,希伯来语,俄语,汉语,日语和阿拉伯语?



谢谢。 / div>

这个问题让我着迷了整整一周,所以既然是周末,我就深入了解,并且找到了一个我称之为 MultiFontParagraph 的解决方案。 段落有一个很大的区别,您可以准确地设置字体后备订单。
$ b 如下:

  from reportlab.pdfbase import pdfmetrics 
from reportlab.pdfbase.ttfonts从reportlab.platypus中导入TTFont
导入段落


class MultiFontParagraph(Paragraph):
#由B8Vrede为http:// stackoverflow创建。 / b $ b def __init __(self,text,style,fonts_locations):
$ b $ font_list = []
为font_name,font_location在fonts_locations中:
#加载字体
font = TTFont(font_name,font_location)

#获取所有已知符号的字符宽度
font_widths = font.face.charWidths

#注册字体,使其能够使用
pdfmetrics.registerFont(字体)

#将字体和信息存储在查找列表中
font_list.append((font_name,font_widths ))

#设置字符串来保存新文本
new_text = u''

#在字符串中循环


#循环字体
的font_name,font_widths的font_list:

#检查这个字体是否知道字符的宽度
#如果是这样,它有一个字形,所以使用it
如果ord(char)在font_widths中:

#设置当前字符的工作字体
new_text + = u'< font name ={}> {}< / font>'。format(font_name,char)
break

段落.__ init __(self,new_text,style)
/ pre>

TL;DR: Is there some way of telling ReportLab to use a specific font, and fallback to another if glyphs for some characters are missing? Alternatively, Do you know of a condensed TrueType font which contains the glyphs for all European languages, Hebrew, Russian, Chinese, Japanese and Arabic?

I've been creating reports with ReportLab, and have encountered problems with rendering strings containing Chinese characters. The font I've been using is DejaVu Sans Condensed, which does not contain the glyphs for Chinese (however, it does contain Cyrillic, Hebrew, Arabic and all sorts of Umlauts for European language support - which makes it pretty versatile, and I need them all from time to time)

Chinese, however, is not supported with the font, and I've not been able to find a TrueType font which supports ALL languages, and meets our graphic design requirements. As a temporary workaround, I made it so that reports for Chinese customers use an entirely different font, containing only English and Chinese glyphs, hoping that characters in other languages won't be present in the strings. However this is, for obvious reasons, clunky and breaks the graphic design, since it's not DejaVu Sans, around which the whole look&feel has been designed.

So the question is, how would you deal with the need to support multiple languages in one document, and maintain usage of a specified font for each language. This is made more complicated due to the fact that sometimes strings contain a mix of languages, so determining which ONE font should be used for each string is not an option.

Is there some way of telling ReportLab to use a specific font, and fallback to another if glyphs for some characters are missing? I found vague hints in the docs that it should be possible, although I might understand it incorrectly.

Alternatively, Do you know of a condensed TrueType font which contains the glyphs for all European languages, Hebrew, Russian, Chinese, Japanese and Arabic?

Thanks.

解决方案

This question fascinated me the complete week, so since it is weekend I dived right into it and exactly found a solution which I called MultiFontParagraph it is a normal Paragraph with one big difference you can exactly set a font fallback order.

For example this random Japanese text I pulled of the internet used the following font fallback "Bauhaus", "Arial", "HanaMinA". It checks whether the first font has a glyph for the character, if so it uses it, if not it fallsback to the next font. Currently the code isn't really efficient as it places tags around each character, this can easily be fixed but for clarity I didn't do it here.

Using the following code I created the above example:

foreign_string = u'6905\u897f\u963f\u79d1\u8857\uff0c\u5927\u53a6\uff03\u5927'
P = MultiFontParagraph(foreign_string, styles["Normal"],
                     [  ("Bauhaus", "C:\Windows\Fonts\\BAUHS93.TTF"),
                        ("Arial", "C:\Windows\Fonts\\arial.ttf"),
                        ("HanaMinA", 'C:\Windows\Fonts\HanaMinA.ttf')])

The source of the MultiFontParagraph (git) is as follows:

from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.platypus import Paragraph


class MultiFontParagraph(Paragraph):
    # Created by B8Vrede for http://stackoverflow.com/questions/35172207/
    def __init__(self, text, style, fonts_locations):

        font_list = []
        for font_name, font_location in fonts_locations:
            # Load the font
            font = TTFont(font_name, font_location)

            # Get the char width of all known symbols
            font_widths = font.face.charWidths

            # Register the font to able it use
            pdfmetrics.registerFont(font)

            # Store the font and info in a list for lookup
            font_list.append((font_name, font_widths))

        # Set up the string to hold the new text
        new_text = u''

        # Loop through the string
        for char in text:

            # Loop through the fonts
            for font_name, font_widths in font_list:

                # Check whether this font know the width of the character
                # If so it has a Glyph for it so use it
                if ord(char) in font_widths:

                    # Set the working font for the current character
                    new_text += u'<font name="{}">{}</font>'.format(font_name, char)
                    break

        Paragraph.__init__(self, new_text, style)

这篇关于ReportLab:使用中文/ Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 11:04