本文介绍了UnicodeEncodeError:"latin-1"编解码器无法编码字符"\ u2013"​​(写入PDF)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用python写入.pdf时,Unicode的内容可变,我遇到了问题.

I am having an issue with Unicode with a variable contents when writing to a .pdf with python.

正在输出此错误:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013'

这基本上是什么?

我尝试使用该变量,其中的内容带有'em破折号',并使用例如.encode('utf-8')'重新定义它,例如,如下:

I have tried taking that variable, where the contents has an 'em dash' and redefined it with an '.encode('utf-8')' for example, i.e., below:

Body = msg.Body

BodyC = Body.encode('utf-8')

现在我得到以下错误:

Traceback (most recent call last):
  File "script.py", line 37, in <module>
    pdf.cell(200, 10, txt="Bod: " + BodyC,  ln=4, align="C")
TypeError: can only concatenate str (not "bytes") to str

下面是我的完整代码,我该如何简单地在'Body'变量内容中修复Unicode错误.

Below is my full code, how could I simply fix my Unicode error in 'Body' variable contents.

转换为utf-8western,不是'latin-1'的任何内容.有什么建议吗?

Converting to utf-8 or western, anything outside of 'latin-1'. Any suggestions?

完整代码:

from fpdf import FPDF
import win32com.client

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\User\language\python\Msg-To-PDF\test_msg.msg")

print (msg.SenderName)
print (msg.SenderEmailAddress)
print (msg.SentOn)
print (msg.To)
print (msg.CC)
print (msg.BCC)
print (msg.Subject)
print (msg.Body)

SenderName = msg.SenderName
SenderEmailAddress = msg.SenderEmailAddress
SentOn = msg.SentOn
To = msg.To
CC = msg.CC
BCC = msg.BCC
Subject = msg.Subject
Body = msg.Body
BodyC = Body.encode('utf-8')

pdf = FPDF()
pdf.add_page()

# pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
pdf.set_font("Helvetica", style = '', size = 11)
pdf.cell(200, 10, txt="From: " + SenderName, ln=1, align="C")
# pdf.cell(200, 10, border=SentOn, ln=1, align="C")
pdf.cell(200, 10, txt="To: " + To, ln=1, align="C")
pdf.cell(200, 10, txt="CC: " + CC, ln=1, align="C")
pdf.cell(200, 10, txt="BCC: " + BCC, ln=1, align="C")
pdf.cell(200, 10, txt="Subject: " + Subject, ln=1, align="C")
pdf.cell(200, 10, txt="Bod: " + BodyC,  ln=4, align="C")

pdf.output("Sample.pdf")

  • 如何从'latin1'中更改?
    • How can I change out of 'latin1'?
      • 是否仅在全球范围内解决这些问题?

      推荐答案

      一种解决方法是将所有文本转换为latin-1编码,然后再传递给库.您可以使用以下命令来做到这一点:

      A workaround is to convert all text to latin-1 encoding before passing it on to the library. You can do that with the following command:

      text2 = text.encode('latin-1', 'replace').decode('latin-1')
      

      text2将不包含任何非拉丁1字符.但是,某些字符可能会被?

      text2 will be free of any non-latin-1 characters. However, some chars may be replaced with ?

      这篇关于UnicodeEncodeError:"latin-1"编解码器无法编码字符"\ u2013"​​(写入PDF)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 22:27