本文介绍了如何使用 toprettyxml() 在同一行中给出 xml 标记和文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个文本文件 20150731100543_1.txt

I have this text file 20150731100543_1.txt

GI-eSTB-MIB-NPH::eSTBGeneralErrorCode.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBGeneralConnectedState.0 = INTEGER: true(1)
GI-eSTB-MIB-NPH::eSTBGeneralPlatformID.0 = INTEGER: 2075
GI-eSTB-MIB-NPH::eSTBMoCAfrequency.0 = INTEGER: 0
GI-eSTB-MIB-NPH::eSTBMoCAMACAddress.0 = STRING: 0:0:0:0:0:0
GI-eSTB-MIB-NPH::eSTBMoCANumberOfNodes.0 = INTEGER: 0

我想在 xml 中转换如下 (20150731100543_1.xml)

Which I want to convert in xml like below (20150731100543_1.xml)

<?xml version="1.0" encoding="UTF-8"?>
<doc>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralErrorCode.0>
            INTEGER: 0
        </eSTBGeneralErrorCode.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralConnectedState.0>
            INTEGER: true(1)
        </eSTBGeneralConnectedState.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralPlatformID.0>
            INTEGER: 2075
        </eSTBGeneralPlatformID.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBMoCAfrequency.0>
            INTEGER: 0
        </eSTBMoCAfrequency.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBMoCAMACAddress.0>
            STRING: 0:0:0:0:0:0
        </eSTBMoCAMACAddress.0>
    </GI-eSTB-MIB-NPH>
    <GI-eSTB-MIB-NPH>
        <eSTBMoCANumberOfNodes.0>
            INTEGER: 0
        </eSTBMoCANumberOfNodes.0>
    </GI-eSTB-MIB-NPH>
</doc>

我可以使用以下脚本完成此操作:

I am able get this done using following script:

import sys
import time
import commands
from xml.etree.ElementTree import Element, SubElement
from xml.etree import ElementTree
from xml.dom import minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element.
    """
    rough_string = ElementTree.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="    ", newl="\n", encoding="UTF-8")

if len(sys.argv) != 2:
    print "\nUsage: python script.py <IP>\n";
    exit(0)
filename_xml = '20150731100543_1.xml'#filename_xml = temp + ".xml"
print "xml filename is: %s\n" % filename_xml
xml = open(filename_xml, 'w+')

top = Element('doc')

with open('20150731100543_1.txt') as f:
    for line in f:
        b = line.split(':')
        child = SubElement(top, b[0])

        c = line.split()
        d = c[0].split(':')
        property =  SubElement(child, d[2])

        property.text = c[2] + " " + c[3]

xml.write(prettify(top))

xml.close()

我在这里有三个问题:

  1. 有什么办法(使用 toprettyxml() 或其他方法)我可以将正在生成的 xml 更改为具有打开和关闭标签和该标签中的文本在同一行?
  2. 我也可以仅在开始时标记在末尾而不是拥有它下面的每个元素?(因为所有元素都是在同一个标​​签内)

所以如果可能的话,xml的格式应该是这样的:

So if possible the format of xml should be like:

<?xml version="1.0" encoding="UTF-8"?>
<doc>
    <GI-eSTB-MIB-NPH>
        <eSTBGeneralErrorCode.0>INTEGER: 0</eSTBGeneralErrorCode.0>
        <eSTBGeneralConnectedState.0>INTEGER: true(1)</eSTBGeneralConnectedState.0>
        <eSTBGeneralPlatformID.0>INTEGER: 2075</eSTBGeneralPlatformID.0>
        <eSTBMoCAfrequency.0>INTEGER: 0</eSTBMoCAfrequency.0>
        <eSTBMoCAMACAddress.0>STRING: 0:0:0:0:0:0</eSTBMoCAMACAddress.0>
        <eSTBMoCANumberOfNodes.0>INTEGER: 0</eSTBMoCANumberOfNodes.0>
    </GI-eSTB-MIB-NPH>
</doc>

我正在为此努力,因为这将在很大程度上减少 xml 中的行数.

I am trying for this as this will reduce the number of lines in xml to great extent.

最后一个也是最不重要的问题是:

The last and least important question is:

  1. 有没有比这更好的方法来获取每一行的子字符串我是如何使用 split() 完成的

  1. Is there any better way to get the substrings of each line thanhow I have done it using split()

with open('20150731100543_1.txt') as f:对于 f 中的行:b = line.split(':')child = SubElement(top, b[0])

with open('20150731100543_1.txt') as f: for line in f: b = line.split(':') child = SubElement(top, b[0])

    c = line.split()
    d = c[0].split(':')
    property =  SubElement(child, d[2])

    property.text = c[2] + " " + c[3]

请原谅我写了这么长的帖子.

Please forgive me for such lengthy post.

推荐答案

1 &2:我使用 etree.tostring 并且我没有任何这些问题.

1 & 2: I use etree.tostring and I don't have any of these problems.

3:可以用正则表达式替换多个拆分操作.

3: Multiple split operations can be replaced with regex.

这应该可以正常工作:

from lxml import etree
import re

filename_xml = '20150731100543_1.xml'

root = etree.Element('doc')
node = etree.SubElement(root, 'GI-eSTB-MIB-NPH')
f = open('20150731100543_1.txt')
text = f.read()
f.close()

# get tag and value from each row
for tag, value in re.findall('GI-eSTB-MIB-NPH::(.*) = (.*$)', text, re.MULTILINE):
   # create child node
   etree.SubElement(node, tag).text = value

xml = etree.tostring(root, pretty_print = True, encoding = 'utf-8', xml_declaration=True)

f = open(filename_xml, 'w')
f.write(xml)
f.close

这篇关于如何使用 toprettyxml() 在同一行中给出 xml 标记和文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-31 01:32