使用ElemTree解析具有UTF-8编码和字节字符串的XML文件

I have the following complete XML file (actual file downloadable here):<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE MedlineCitationSet PUBLIC "-//NLM//DTD Medline Citation, 1st January, 2014//EN" "http://www.nlm.nih.gov/databases/dtd/nlmmedlinecitationset_140101.dtd"><MedlineCitationSet><MedlineCitation Owner="NLM" Status="In-Data-Review"><PMID Version="1">24560200</PMID><Article PubModel="Print-Electronic"><Journal><ISSN IssnType="Print">1166-7087</ISSN><JournalIssue CitedMedium="Internet"><Volume>24</Volume><Issue>3</Issue><PubDate><Year>2014</Year><Month>Mar</Month></PubDate></JournalIssue><Title>Progrès en urologie : journal de l'Association française d'urologie et de la Société française d'urologie</Title><ISOAbbreviation>Prog. Urol.</ISOAbbreviation></Journal><ArticleTitle>[Multiparametric 3T MRI in the routine staging of prostate cancer].</ArticleTitle><Pagination><MedlinePgn>145-53</MedlinePgn></Pagination><Abstract><AbstractText Label="RESULTS" NlmCategory="RESULTS">Five hundred and ninety-two octants were considered with 124 significant tumors (volume≥0.1cm(3)). The general ability of tumor detection had a sensitivity, specificity, PPV and NPV respectively to 72.3%, 87.4%, 83.2% and 78.5%. The estimate of the CC and ECE had a high negative predictive power with specificities and VPN respectively to 96.4% and 95.4% for CC, and 97.5 and 97.7% for ECE.</AbstractText><CopyrightInformation>Copyright © 2013 Elsevier Masson SAS. All rights reserved.</CopyrightInformation></Abstract></Article></MedlineCitation></MedlineCitationSet>我要做的只是解析数据并打印PMID和标题.这是我的代码:What I want to do is simply to parse the data and print the PMID and title.This is the code that I have:#!/usr/bin/env pythonimport xml.etree.ElementTree as ETdef parse_xml(xmlfile): """docstring for parse_xml""" tree = ET.parse(xmlfile) root = tree.getroot() for medcit in root.findall('MedlineCitation'): pmid = medcit.find('PMID').text title = medcit.find('Article/Journal/Title').text #year = medcit.find('Article/Journal/JournalIssue/PubDate/Year') #medlinedate = medcit.find('Article/Journal/JournalIssue/MedlineDate') print pmid, titleif __name__ == '__main__' filename = "myxmlfile.xml' parse_xml(filename)但是它给了我以下错误信息:However it gave me the following Error message:24560200 Traceback (most recent call last): File "./parse_xml.py", line 41, in <module> parse_xml(fvar) File "./parse_xml.py", line 29, in parse_xml print pmid, titleUnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 5: ordinal not in range(128)解析和打印它的正确方法是什么?What's the correct way to parse and print it?推荐答案已在此处回答: print pmid.encode('utf8'), title.encode('utf8')代替 print pmid, title 这篇关于使用ElemTree解析具有UTF-8编码和字节字符串的XML文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！