本文介绍了xml.parsers.expat.ExpatError:格式不正确(令牌无效)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用 xmltodict 加载下面的 xml 文件时,出现错误:xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 1

When I use xmltodict to load the xml file below I get an error:xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 1

这是我的文件:

<?xml version="1.0" encoding="utf-8"?>
<mydocument has="an attribute">
  <and>
    <many>elements</many>
    <many>more elements</many>
  </and>
  <plus a="complex">
    element as well
  </plus>
</mydocument>

来源:

import xmltodict
with open('fileTEST.xml') as fd:
   xmltodict.parse(fd.read())

我使用的是 Windows 10,使用 Python 3.6 和 xmltodict 0.11.0

I am on Windows 10, using Python 3.6 and xmltodict 0.11.0

如果我使用 ElementTree 它可以工作

If I use ElementTree it works

tree = ET.ElementTree(file='fileTEST.xml')
    for elem in tree.iter():
            print(elem.tag, elem.attrib)

mydocument {'has': 'an attribute'}
and {}
many {}
many {}
plus {'a': 'complex'}

注意:我可能遇到了换行问题.
注意 2:我在两个不同的文件上使用了 Beyond Compare.
它在 UTF-8 BOM 编码的文件上崩溃,并在 UTF-8 文件上工作.
UTF-8 BOM 是一个字节序列 (EF BB BF),允许阅读器将文件识别为以 UTF-8 编码.

Note: I might have encountered a new line problem.
Note2: I used Beyond Compare on two different files.
It crashes on the file that is UTF-8 BOM encoded, and works om the UTF-8 file.
UTF-8 BOM is a sequence of bytes (EF BB BF) that allows the reader to identify a file as being encoded in UTF-8.

推荐答案

我想你忘记定义编码类型了.我建议您尝试将该 xml 文件初始化为字符串变量:

I think you forgot to define the encoding type.I suggest that you try to initialize that xml file to a string variable:

import xml.etree.ElementTree as ET
import xmltodict
import json


tree = ET.parse('your_data.xml')
xml_data = tree.getroot()
#here you can change the encoding type to be able to set it to the one you need
xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')

data_dict = dict(xmltodict.parse(xmlstr))

这篇关于xml.parsers.expat.ExpatError:格式不正确(令牌无效)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-22 08:37