问题描述
下面的代码获取XML文件的目录,并将其解析为CSV文件。仅此社区中的用户可以这样做。我学到了很多。
从xml.etree中将ElementTree导入为ET
,从集合中导入defaultdict
从pathlib导入csv
导入路径
目录='C:/ Users / docs / FolderwithXMLs'
带有open('output.csv','w' ,newline ='')作为f:
writer = csv.writer(f)
headers = ['id','service_code','rational','qualify','description_num ','description_txt','set_data_xin','set_data_xax','set_data_value','set_data_x']
writer.writerow(headers)
xml_files_list = list(map( str,Path(directory).glob('** / *。xml')))
用于xml_files_list中的xml_file:
树= ET.parse(xml_file)
根= tree.getroot ()
start_nodes = root.findall('.// START')
for start_nodes中的sn:
行= defaultdict(str)
对于sn.attrib.items()中的k,v:
row [k] = v
对于sn.findall(' .//Rational'):
row ['rational'] = rn.text
for sn in sn.findall('.// Qualify'):
row [ 'qualify'] = sn.findall('.// Description')中ds的qu.text
:
row ['description_txt'] = ds.text
行['description_num'] = ds.attrib ['num']
for st in sn.findall('.// SetData'):
for k,v in st.attrib.items():
row ['set_data _'+ str(k)] = v
row_data = [row [i] for in headers]
writer.writerow(row_data )
row = defaultdict(str)
xml文件具有另一种格式喜欢这个
<?xml version = 1.0 encoding = utf-8吗? >
< ProjectData>
< FINAL>
< START id = ID0001 service_code = 0x5196>
< Docs Docs_type = START>
< Rational> 225196< / Rational>
< Qualify> 6251960000A0DE< / Qualify>
< / Docs>
< Description num = 1213f2312>参数< / Description>
< DataFile dg = 12 dg_id = let>
< SetData value = 32 />
< / DataFile>
< / START>
< START id = DG0003 service_code = 0x517B>
< Docs Docs_type = START>
< Rational> 23423< / Rational>
< Qualify> 342342< / Qualify>
< / Docs>
< Description num = 3423423f3423>第三个< / Description>
< DataFile dg = 55 dg_id = big>
< SetData x = E1 value = 21259 />
< SetData x = E2 value = 02 />
< / DataFile>
< / START>
< START id = ID0048 service_code = 0x5198>
< RawData rawdata_type = ASDS>
< Rational> 225198< / Rational>
< Qualify> 343243324234234< / Qualify>
< / RawData>
< Description num = 434234234>第四< / Description>
< DataFile unit = 21 unit_id = FEDS>
< FileX unit = eg离散= false axis_pts = 19 name = Vsome text_id = bx5 unit_id = GDFSD />
< SetData xin = 5 xax = 233 value = 323 />
< SetData xin = 123 xax = 77 value = 555 />
< SetData xin = 17 xax = 65 value = 23 />
< / DataFile>
< / START>
< / FINAL>
< / ProjectData>
结果如下图所示。
最近,我一直在尝试修改代码,以使结果看起来类似于图片吼叫。
让我们以id = ID0048为例,该代码仅解析一次id,service_code,但是如果有多行SetData,它将创建一个新行,但不会重复id,service_code和其他代码。努力实现下面的图片
使用Python的第三方模块
要遍历XML文件的文件夹,只需将上面的内容集成到一个循环中即可。这里将所有XML处理包装到一个方法中,以通过列表理解来构建结果列表,最后以迭代方式写入CSV。 注意:对于一组标头,仅将标头放在CSV中,然后如上所述将其从XSLT中删除。
从pathlib导入lxml.etree等
导入路径
#加载XSL脚本
xsl = et.parse('Script.xsl')#加载XML文件一次(删除标题)
def proc_xml(xml_file):
xml = et.parse(xml_file)#加载XML文件
transform = et.XSLT(xsl)#初始化变压器
结果= transform(xml)#转换输入
返回str(结果)
xml_files_list = list(map(str,Path(directory).glob('** / *。xml ')))
结果= [xml_files_list中x的proc_xml(x)]
,其中open('Output.csv','w',newline ='')as f:
f.write('id,service_code,rational,qualify,description_num,description,'
'data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,'
'set_data_x,set_data_xin,set_data_xat,set_data_valuen ')
#将XML保存为CSV
代表r:
f.write(r)
The code below takes a directory of XMLs files and parses them into a CSV fie. This was possible only for a user in this community. I have learned so much.
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
from pathlib import Path
directory = 'C:/Users/docs/FolderwithXMLs'
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
for k,v in sn.attrib.items():
row[k] = v
for rn in sn.findall('.//Rational'):
row['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
row['qualify'] = qu.text
for ds in sn.findall('.//Description'):
row['description_txt'] = ds.text
row['description_num'] = ds.attrib['num']
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
The xml files on the other hand have a format likes this
<?xml version="1.0" encoding="utf-8"?>
<ProjectData>
<FINAL>
<START id="ID0001" service_code="0x5196">
<Docs Docs_type="START">
<Rational>225196</Rational>
<Qualify>6251960000A0DE</Qualify>
</Docs>
<Description num="1213f2312">The parameter</Description>
<DataFile dg="12" dg_id="let">
<SetData value="32" />
</DataFile>
</START>
<START id="DG0003" service_code="0x517B">
<Docs Docs_type="START">
<Rational>23423</Rational>
<Qualify>342342</Qualify>
</Docs>
<Description num="3423423f3423">The third</Description>
<DataFile dg="55" dg_id="big">
<SetData x="E1" value="21259" />
<SetData x="E2" value="02" />
</DataFile>
</START>
<START id="ID0048" service_code="0x5198">
<RawData rawdata_type="ASDS">
<Rational>225198</Rational>
<Qualify>343243324234234</Qualify>
</RawData>
<Description num="434234234">The forth</Description>
<DataFile unit="21" unit_id="FEDS">
<FileX unit="eg" discrete="false" axis_pts="19" name="Vsome" text_id="bx5" unit_id="GDFSD" />
<SetData xin="5" xax="233" value="323" />
<SetData xin="123" xax="77" value="555" />
<SetData xin="17" xax="65" value="23" />
</DataFile>
</START>
</FINAL>
</ProjectData>
The results look like the picture below.
Recently I have been trying to modify the code, so that the results look similar to the picture bellow.Let’s take id="ID0048", the code parses id, service_code only once but it if there are multiple lines of SetData, it will create a new line but it wont repeat the id, service_code and the others. Struggling to achieve something like the picture below
Consider the special purpose language, XSLT, using Python's third-party module, lxml
, to directly transform XML to CSV output. Specifically, have XSLT pull from the lower level, SetData
and retrieve upper level information with ancestor
.
XSLT (save as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" method="text"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="delim">,</xsl:variable>
<xsl:template match="/ProjectData">
<!------------------------------- HEADERS ------------------------------->
<xsl:text>id,service_code,rational,qualify,description_num,description,</xsl:text>
<xsl:text>data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,</xsl:text>
<xsl:text>set_data_x,set_data_xin,set_data_xat,set_data_value
</xsl:text>
<!----------------------------------------------------------------------->
<xsl:apply-templates select="descendant::SetData"/>
</xsl:template>
<xsl:template match="SetData">
<xsl:value-of select="concat(ancestor::START/@id, $delim,
ancestor::START/@service_code, $delim,
ancestor::START/*[1]/Rational, $delim,
ancestor::START/*[1]/Qualify, $delim,
ancestor::START/Description/@num, $delim,
ancestor::START/Description, $delim,
ancestor::START/DataFile/@dg, $delim,
ancestor::START/DataFile/@dg_id, $delim,
ancestor::START/DataFile/@unit, $delim,
ancestor::START/DataFile/@unit_id, $delim,
@x, $delim,
@xin, $delim,
@xat, $delim,
@value)"/>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Python (no for
loops or if
/else
logic)
import lxml.etree as et
# LOAD XML AND XSL FILES
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')
# INITIALIZE TRANSFORMER
transform = et.XSLT(xsl)
# TRANSFORM INPUT
result = transform(xml)
print(str(result))
# id,service_code,rational,qualify,description_num,description,data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,set_data_x,set_data_xin,set_data_xat,set_data_value
# ID0001,0x5196,225196,6251960000A0DE,1213f2312,The parameter,12,let,,,,,,32
# DG0003,0x517B,23423,342342,3423423f3423,The third,55,big,,,E1,,,21259
# DG0003,0x517B,23423,342342,3423423f3423,The third,55,big,,,E2,,,02
# ID0048,0x5198,225198,343243324234234,434234234,The forth,,,21,FEDS,,5,,323
# ID0048,0x5198,225198,343243324234234,434234234,The forth,,,21,FEDS,,123,,555
# ID0048,0x5198,225198,343243324234234,434234234,The forth,,,21,FEDS,,17,,23
# SAVE XML TO CSV
with open('Output.csv', 'wb') as f:
f.write(str(result))
To loop across a folder of XML files, simply integrate above in a loop. Here wraps all XML processing into a single method to build a list of results via list comprehension and finally written to CSV iteratively. NOTE: For one set of headers, place headers only in CSV and remove from XSLT as indicated above.
import lxml.etree as et
from pathlib import Path
# LOAD XSL SCRIPT
xsl = et.parse('Script.xsl') # LOAD XML FILE ONCE (REMOVE HEADERS)
def proc_xml(xml_file):
xml = et.parse(xml_file) # LOAD XML FILE
transform = et.XSLT(xsl) # INITIALIZE TRANSFORMER
result = transform(xml) # TRANSFORM INPUT
return str(result)
xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
results = [proc_xml(x) for x in xml_files_list]
with open('Output.csv', 'w', newline='') as f:
f.write('id,service_code,rational,qualify,description_num,description,'
'data_file_dg,data_file_dg_id,data_file_unit,data_file_unit_id,'
'set_data_x,set_data_xin,set_data_xat,set_data_value\n')
# SAVE XML TO CSV
for r in results:
f.write(r)
这篇关于将元素重复到新行ElementTree的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!