问题描述
我正在使用pdf.js.提取文本时,我得到了带有字体信息的块
I am using pdf.js. Fetching the Text I get blocks with font info
Object {
str: "blabla",
dir: "ltr",
width: 191.433141,
height: 12.546,
transform: Array[6],
fontName: "g_d0_f2"
}
是否有可能获得有关 g_d0_f2 的更多信息.
Is it possible to get somehow more information about g_d0_f2.
推荐答案
请注意,PDF.js getTextContent不会也不应该与PDF中的字形匹配. PDF32000规范具有两种用于文本显示和提取的不同算法.即使您可以在page.commonObjs中查找字体数据,由于字形编码不匹配,对于提取的文本内容显示也可能并没有真正的帮助.
Notice the PDF.js getTextContent will not and not suppose to match glyphs in PDFs. The PDF32000 specification has two different algorithms for text display and extraction. Even if you can lookup font data in the page.commonObjs, it might not be really helpful for extracted text content display due to glyphs encoding mismatch.
页面的getTextContent正在提取文本,而getOperatorList获取(字形)显示运算符.查看src/display/svg.js渲染器如何显示字形.
The page's getTextContent is doing text extraction and getOperatorList gets (glyph) display operators. See how src/display/svg.js renderer displays glyphs.
这篇关于pdf.js获取有关嵌入式字体的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!