pandas 读取_excel:'utf-8'编解码器无法解码位置14的字节0xa8:无效的起始字节

本文介绍了 pandas 读取_excel:'utf-8'编解码器无法解码位置14的字节0xa8:无效的起始字节的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

试图读取MS Excel文件，版本2016.文件包含多个带有数据的列表.从数据库下载的文件，可以在MS Office中正确打开.在下面的示例中，我更改了文件名.

Trying to read MS Excel file, version 2016. File contains several lists with data. File downloaded from DataBase and it can be opened in MS Office correctly. In example below I changed the file name.

文件包含俄语和英语单词.最有可能使用了Latin-1编码，但是encoding='latin-1'没有帮助

file contains russian and english words. Most probably used the Latin-1 encoding, but encoding='latin-1' does not help

import pandas as pd
with open('1.xlsx', 'r', encoding='utf8') as f:
        data = pd.read_excel(f)

结果:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte

没有encoding ='utf8'

'charmap' codec can't decode byte 0x9d in position 622: character maps to <undefined>

P.S.任务是处理52个文件，以将每个工作表中的数据与52个文件中的对应工作表合并.因此，请不要处理任何工作建议.

P.S. Task is to process 52 files, to merge data in every sheet with corresponded sheets in the 52 files. So, please no handle work advices.

推荐答案

问题很可能出在俄语符号中.

Most probably the problem is in Russian symbols.

Charmap是默认解码方法，用于在没有编码的情况下使用.

Charmap is default decoding method used in case no encoding is beeing noticed.

如我所见，如果utf-8和latin-1不能帮上忙，那么请尝试读取该文件而不是

As I see if utf-8 and latin-1 do not help then try to read this file not as

pd.read_excel(f)

但是

pd.read_table(f)

甚至只是

f.readline()

为了检查什么是符号，请引起注意并删除该符号.

in order to check what is a symbol raise an exeception and delete this symbol/symbols.

这篇关于 pandas 读取_excel:'utf-8'编解码器无法解码位置14的字节0xa8:无效的起始字节的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！