问题描述
我有这个文本文件 20150731100543_1.txt
I have this text file 20150731100543_1.txt
GI-eSTB-MIB-NPH::eSTBGeneralErrorCode.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBGeneralConnectedState.0 = INTEGER: true(1)
GI-eSTB-MIB-NPH::eSTBGeneralPlatformID.0 = INTEGER: 2075
GI-eSTB-MIB-NPH::eSTBMoCAfrequency.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBMoCAMACAddress.0 = STRING: 0:0:0:0:0:0
GI-eSTB-MIB-NPH::eSTBMoCANumberOfNodes.0 = INTEGER: 0
我想在 xml 中转换如下 (20150731100543_1.xml)
Which I want to convert in xml like below (20150731100543_1.xml)
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<GI-eSTB-MIB-NPH>
<eSTBGeneralErrorCode.0>
INTEGER: 0
</eSTBGeneralErrorCode.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBGeneralConnectedState.0>
INTEGER: true(1)
</eSTBGeneralConnectedState.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBGeneralPlatformID.0>
INTEGER: 2075
</eSTBGeneralPlatformID.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBMoCAfrequency.0>
INTEGER: 0
</eSTBMoCAfrequency.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBMoCAMACAddress.0>
STRING: 0:0:0:0:0:0
</eSTBMoCAMACAddress.0>
</GI-eSTB-MIB-NPH>
<GI-eSTB-MIB-NPH>
<eSTBMoCANumberOfNodes.0>
INTEGER: 0
</eSTBMoCANumberOfNodes.0>
</GI-eSTB-MIB-NPH>
</doc>
我可以使用以下脚本完成此操作:
I am able get this done using following script:
import sys
import time
import commands
from xml.etree.ElementTree import Element, SubElement
from xml.etree import ElementTree
from xml.dom import minidom
def prettify(elem):
"""Return a pretty-printed XML string for the Element.
"""
rough_string = ElementTree.tostring(elem, 'utf-8')
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ", newl="\n", encoding="UTF-8")
if len(sys.argv) != 2:
print "\nUsage: python script.py <IP>\n";
exit(0)
filename_xml = '20150731100543_1.xml'#filename_xml = temp + ".xml"
print "xml filename is: %s\n" % filename_xml
xml = open(filename_xml, 'w+')
top = Element('doc')
with open('20150731100543_1.txt') as f:
for line in f:
b = line.split(':')
child = SubElement(top, b[0])
c = line.split()
d = c[0].split(':')
property = SubElement(child, d[2])
property.text = c[2] + " " + c[3]
xml.write(prettify(top))
xml.close()
我在这里有三个问题:
- 有什么办法(使用 toprettyxml() 或其他方法)我可以将正在生成的 xml 更改为具有打开和关闭标签和该标签中的文本在同一行?
- 我也可以仅在开始时标记在末尾而不是拥有它下面的每个元素?(因为所有元素都是在同一个标签内)
所以如果可能的话,xml的格式应该是这样的:
So if possible the format of xml should be like:
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<GI-eSTB-MIB-NPH>
<eSTBGeneralErrorCode.0>INTEGER: 0</eSTBGeneralErrorCode.0>
<eSTBGeneralConnectedState.0>INTEGER: true(1)</eSTBGeneralConnectedState.0>
<eSTBGeneralPlatformID.0>INTEGER: 2075</eSTBGeneralPlatformID.0>
<eSTBMoCAfrequency.0>INTEGER: 0</eSTBMoCAfrequency.0>
<eSTBMoCAMACAddress.0>STRING: 0:0:0:0:0:0</eSTBMoCAMACAddress.0>
<eSTBMoCANumberOfNodes.0>INTEGER: 0</eSTBMoCANumberOfNodes.0>
</GI-eSTB-MIB-NPH>
</doc>
我正在为此努力,因为这将在很大程度上减少 xml 中的行数.
I am trying for this as this will reduce the number of lines in xml to great extent.
最后一个也是最不重要的问题是:
The last and least important question is:
有没有比这更好的方法来获取每一行的子字符串我是如何使用 split() 完成的
Is there any better way to get the substrings of each line thanhow I have done it using split()
with open('20150731100543_1.txt') as f:对于 f 中的行:b = line.split(':')child = SubElement(top, b[0])
with open('20150731100543_1.txt') as f: for line in f: b = line.split(':') child = SubElement(top, b[0])
c = line.split()
d = c[0].split(':')
property = SubElement(child, d[2])
property.text = c[2] + " " + c[3]
请原谅我写了这么长的帖子.
Please forgive me for such lengthy post.
推荐答案
1 &2:我使用 etree.tostring 并且我没有任何这些问题.
1 & 2: I use etree.tostring and I don't have any of these problems.
3:可以用正则表达式替换多个拆分操作.
3: Multiple split operations can be replaced with regex.
这应该可以正常工作:
from lxml import etree
import re
filename_xml = '20150731100543_1.xml'
root = etree.Element('doc')
node = etree.SubElement(root, 'GI-eSTB-MIB-NPH')
f = open('20150731100543_1.txt')
text = f.read()
f.close()
# get tag and value from each row
for tag, value in re.findall('GI-eSTB-MIB-NPH::(.*) = (.*$)', text, re.MULTILINE):
# create child node
etree.SubElement(node, tag).text = value
xml = etree.tostring(root, pretty_print = True, encoding = 'utf-8', xml_declaration=True)
f = open(filename_xml, 'w')
f.write(xml)
f.close
这篇关于如何使用 toprettyxml() 在同一行中给出 xml 标记和文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!