问题描述
我想将图片字节包含到 JSON 中,但我遇到了编码问题:
I would like to include picture bytes into a JSON, but I struggle with a encoding issue:
import urllib
import json
data = urllib.urlopen('https://www.python.org/static/community_logos/python-logo-master-v3-TM-flattened.png').read()
json.dumps({'picture' : data})
UnicodeDecodeError: 'utf8' 编解码器无法解码位置 0 的字节 0x89:无效起始字节
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: invalid start byte
我不知道如何处理这个问题,因为我正在处理一个图像,所以我对这个编码问题有点困惑.我正在使用 python 2.7.有没有人可以帮助我?:)
I don't know how to deal with that issue since I am handling an image, so I am a bit confused about this encoding issue. I am using python 2.7. Does anyone can help me? :)
推荐答案
JSON 数据需要处理 Unicode 文本.二进制图像数据不是文本,因此当 json.dumps()
函数尝试使用 UTF-8(默认)将字节串解码为 unicode
时,解码失败.
JSON data expects to handle Unicode text. Binary image data is not text, so when the json.dumps()
function tries to decode the bytestring to unicode
using UTF-8 (the default) that decoding fails.
您必须首先将二进制数据包装在文本安全编码中,例如 Base-64:
You'll have to wrap your binary data in a text-safe encoding first, such as Base-64:
json.dumps({'picture' : data.encode('base64')})
当然,这假设接收者希望您的数据如此包装.
Of course, this then assumes that the receiver expects your data to be wrapped so.
如果您的 API 端点设计得如此糟糕,以至于期望您的图像字节作为文本传入,那么另一种选择是假装您的字节是真正的文本;如果您首先将其解码为 Latin-1,您可以将这些字节直接映射到 Unicode 代码点:
If your API endpoint has been so badly designed to expect your image bytes to be passed in as text, then the alternative is to pretend that your bytes are really text; if you first decode it as Latin-1 you can map those bytes straight to Unicode codepoints:
json.dumps({'picture' : data.encode('latin-1')})
如果数据已经是一个 unicode
对象,json
库将继续将其视为文本.这确实意味着它可以用 uhhhh
转义符替换非 ASCII 代码点.
With the data already a unicode
object the json
library will then proceed to treat it as text. This does mean that it can replace non-ASCII codepoints with uhhhh
escapes.
这篇关于如何使用python将图片字节包含到JSON中?(编码问题)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!