本文介绍了使用尽可能少的硬编码将XML文件解析为CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析xml并以尽可能少的硬编码获取标签并转换为CSV

I would like to parse through the xml and get tags with as little hard-coding as possible and convert to CSV

我将需要对以下特定的列名称进行硬编码:'InfoGroup','InfoRegister','RegisterType','Measures','Description','GeneratedOn'

I will need to hard-code these specific column names:'InfoGroup', 'InfoRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn'

InfoGroup是诸如RecordingSystem,Ports等的名称标签

InfoGroup are the name tags like RecordingSystem, Ports, etc

InfoRegister是位于行标记(如closedFileCount,processedFileCount等)内的子名称

InfoRegister is the sub name located inside the row tags like closedFileCount, processedFileCount, etc

RegisterType是子名称所在的标签名称,如,,等

RegisterType is the tag name where the sub name is located like , , , etc

措施仅仅是措施标签

描述仅仅是描述标签

GeneratedOn位于诸如sessmgr,rtpportal等之类的generateOn标签内

GeneratedOn is located inside the generatedOn tag like sessmgr, rtpportal, etc

如果xml中有其他标签或新标签,我希望它能够自动将其添加到csv中.

If there are any other or new tags in the xml I would like it to be able to add it to the csv automatically.

我目前拥有的所有实现基本上都是硬编码的,但是我无法使其正常运行.请使用我的xml运行代码,以查看CSV的实际外观.

The current implementation I have is all basically hard-coded but I couldn't get it to function otherwise. Please run the code with my xml to see how the CSV should actually look like.

<?xml version="1.0" encoding="UTF-8"?>

<infoconfig xmlns="urn:nortel:namespaces:mcp:oms" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:oms OMSchema.xsd" >

        <group>
                <name>RecordingSystem</name>
                <row>
                        <package>com.nortelnetworks.mcp.ne.base.recsystem.fw.system</package>
                        <class>RecSysFileOMRow</class>
                        <usage name="closedFileCount" hasThresholds="true">
                                <measures>
                                        closed file count
                                </measures>
                                <description>
                                        This register counts the number
                                        of closed files in the spool directory of a
                                        particular stream and a particular system.
                                        Files in the spool directory store the raw
                                        OAM records where they are sent to the
                                        Element Manager for formatting.
                                </description>
                                <notes>
                                        Minor and major alarms
                                        when the value of closedFileCount
                                        exceeds certain thresholds. Configure
                                        the threshold values for minor and major
                                        alarms for this OM through engineering
                                        parameters for minorBackLogCount and
                                        majorBackLogCount, respectively. These
                                        engineering parameters are grouped under
                                        the parameter group of Log, OM, and
                                        Accounting for the logs’ corresponding
                                        system.
                                </notes>
                        </usage>
                        <usage name="processedFileCount" hasThresholds="true">
                                <measures>
                                        Processed file count
                                </measures>
                                <description>
                                        The register counts the number
                                        of processed files in the spool directory of
                                        a particular stream and a particular system.
                                        Files in the spool directory store the raw
                                        OAM records and then send the records to
                                        the Element Manager for formatting.
                                </description>
                        </usage>
                </row>
                <documentation>
                        <description>
                                Rows of this OM group provide a count of the number of files contained
                                within the directory (which is the OM row key value).
                        </description>
                        <rowKey>
                                The full name of the directory containing the files counted by this row.
                        </rowKey>
                </documentation>
                <generatedOn>
                        <all/>
                </generatedOn>
        </group>
        <group traffic="true">
                <name>Ports</name>
                <row>
                        <package>com.nortelnetworks.ims.cap.mediaportal.host</package>
                        <class>PortsOMRow</class>
                        <usage name="rtpMpPortUsage">
                                <measures>
                                        BCP port usage
                                </measures>
                                <description>
                                        Meter showing number of ports in use.
                                </description>
                        </usage>
                        <lwGauge name="connMapEntriesLWM">
                                <measures>
                                        Lowest simultaneous port usage
                                </measures>
                                <description>
                                        Lowest number of
                                        simultaneous ports detected to be in
                                        use during the collection interval
                                </description>
                        </lwGauge>
                        <hwGauge name="connMapEntriesHWM">
                                <measures>
                                        Highest simultaneous port usage
                                </measures>
                                <description>
                                        Highest number of
                                        simultaneous ports detected to be in
                                        use during the collection interval.
                                </description>
                        </hwGauge>
                        <waterMark name="connMapEntries">
                                <measures>
                                        Connections map entries
                                </measures>
                                <description>
                                        Meter showing the number of connections in the host
                                        CPU connection map.
                                </description>
                                <bwg lwref="connMapEntriesLWM" hwref="connMapEntriesHWM"/>
                        </waterMark>
                        <counter name="portUsageSampleCnt">
                                <measures>
                                    Usage sample count
                                </measures>
                                <description>
                                    The number of 100-second samples taken during the
                                    collection interval contributing to the average report.
                                </description>
                        </counter>
                        <counter name="sampledRtpMpPortUsage">
                                <measures>
                                    In-use ports usage
                                </measures>
                                <description>
                                    Provides the sum of the in-use ports every 100 seconds.
                                </description>
                        </counter>
                        <precollector>
                                <package>com.nortelnetworks.ims.cap.mediaportal.host</package>
                                <class>PortsOMCenturyPrecollector</class>
                                <collector>centurySecond</collector>
                        </precollector>
                </row>
                <documentation>
                        <description>
                        </description>
                        <rowKey>
                        </rowKey>
                </documentation>
                <generatedOn>
                        <list>
                            <ne>sessmgr</ne>
                            <ne>rtpportal</ne>
                        </list>
                </generatedOn>
        </group>
        <group traffic="true">
            <name>SASIPPBXTrunkGroupCallMgmt</name>
            <row>
                <package>com.nortelnetworks.ims.cap.svc.sippbx.fsm</package>
                <class>StandAloneSipPbxTrunkGroupOMRow</class>
                <hwGauge name="callAttemptsHighForOrigination">
                    <measures></measures>
                    <description></description>
                </hwGauge>
                <waterMark name="callAttemptsForOrigination">
                    <measures> Number of Call attempts </measures>
                    <description>> This counter will keep track of incoming call attempts of Trunk Group  to or from a SIPPBX node </description>
                    <bwg lwref="callAttemptsLowForOrigination" hwref="callAttemptsHighForOrigination"/>
                </waterMark>
                <lwGauge name="callAttemptsLowForOrigination">
                    <measures></measures>
                    <description></description>
                </lwGauge>
                <hwGauge name="callAttemptsHighForTermination">
                    <measures></measures>
                    <description></description>
                </hwGauge>
                <waterMark name="callAttemptsForTermination">
                    <measures> Number of Call attempts </measures>
                    <description>> This counter will keep track of outgoing call attempts of Trunk Group  to or from a SIPPBX node </description>
                    <bwg lwref="callAttemptsLowForTermination" hwref="callAttemptsHighForTermination"/>
                </waterMark>
                <lwGauge name="callAttemptsLowForTermination">
                    <measures></measures>
                    <description></description>
                </lwGauge>
                <hwGauge name="activeCallsHighForOrigination">
                    <measures></measures>
                    <description></description>
                </hwGauge>
                <waterMark name="activeCallsForOrigination">
                    <measures> Number of Incoming Active calls </measures>
                    <description>> This counter will keep track of incoming active call of Trunk Group  to or from a SIPPBX node </description>
                    <bwg lwref="activeCallsLowForOrigination" hwref="activeCallsHighForOrigination"/>
                </waterMark>
                <lwGauge name="activeCallsLowForOrigination">
                    <measures></measures>
                    <description></description>
                </lwGauge>
                <hwGauge name="activeCallsHighForTermination">
                    <measures></measures>
                    <description></description>
                </hwGauge>
                <waterMark name="activeCallsForTermination">
                    <measures> Number of Outgoing Active calls </measures>
                    <description>> This counter will keep track of outgoing call active call of Trunk Group  to or from a SIPPBX node </description>
                    <bwg lwref="activeCallsLowForTermination" hwref="activeCallsHighForTermination"/>
                </waterMark>
                <lwGauge name="activeCallsLowForTermination">
                    <measures></measures>
                    <description></description>
                </lwGauge>
                <counter name="deniedCallsDueToCapacityForOrigination">
                    <measures>Number of Denied Calls due to capacity </measures>
                    <description>This counter will keep track denied for incoming call attempts of Trunk Group  to or from a SIPPBX node </description>
                </counter>
                <counter name="deniedCallsDueToCapacityForTermination">
                    <measures>Number of Denied Calls due to capacity </measures>
                    <description>This counter will keep track denied for outgoing call attempts of Trunk Group  to or from a SIPPBX node </description>
                </counter>
                <counter name="failoverRouteCallAttempts">
                    <measures>Number of FailOverRoute Call  attempts </measures>
                    <description>This counter will keep track of FailOverRoute Call attempts of Trunk Group  for a SIPPBX node </description>
                </counter>
            </row>
            <documentation>
                <description></description>
                <rowKey></rowKey>
            </documentation>
            <generatedOn>
                <list>
                    <ne>sessmgr</ne>
                </list>
            </generatedOn>
        </group>

</infoconfig>

从bs4

from bs4 import BeautifulSoup
import re
import csv



def extract_data_from_report3():
    xmlfile = open('infoconfig.xml', 'r')
    soup = BeautifulSoup(xmlfile, 'lxml')

    with open('data2.csv', 'w', newline='') as f_out:
        writer = csv.writer(f_out)
        writer.writerow(['InfoGroup:InfoRegister', 'InfoGroup', 'InfoRegister', 'RegisterType', 'Measures', 'Description', 'GeneratedOn'])


        for item in soup.select('row [name]'):
            desc = getattr(item.find('description'), 'text', None)
            desc= str(desc)
            desc = re.sub(r'\s{2,}', ' ', desc)
            generatedOn = ','.join(ne.get_text(strip=True) for ne in item.find_parent('group').select('ne'))

            writer.writerow([item.find_previous('name').text + ':' + item['name'], item.find_previous('name').text, item['name'], item.name, item.find('measures').get_text(strip=True), desc, generatedOn])

        print("File successfuly converted to CSV")

问题的屏幕截图

任何帮助将不胜感激

推荐答案

我仍然不了解您提到的其他新标签的规则,但是我根据您当前的逻辑对其进行了重写.我们可以在此基础上进一步沟通,以最终实现您想要的结果.

I still don't understand the rules of the other new tags you mentioned, but I rewrite it according to your current logic. We can further communicate on this basis to finally achieve the results you want.

from simplified_scrapy import SimplifiedDoc, utils


def extract_data_from_report3():

    header = [
        'InfoGroup:InfoRegister', 'InfoGroup', 'InfoRegister', 'RegisterType', 'GeneratedOn' # edit
    ]
    datas = []
    doc = SimplifiedDoc(utils.getFileContent('infoconfig.xml'))
    groups = doc.selects('group')
    for group in groups:
        name = group.select('name>text()')
        # generatedOn = ','.join(group.selects('generatedOn>ne>text()'))
        # edit start...
        all = group.select('generatedOn').child
        if not all.child:
            generatedOn = all.tag
        else:
            generatedOn = ','.join(all.selects('ne>text()'))
        # edit end...

        RegisterTypes = group.row.children.containsReg(
            '.+', attr='name')  # The node with the name attribute.
        for registerType in RegisterTypes:
            extr = {}
            for c in registerType.children:
                if c['tag'] not in header:
                    header.append(c['tag'])
                extr[c['tag']] = c.text # edit

            datas.append([
                '{}:{}'.format(name, registerType['name']), name,
                registerType['name'], registerType['tag'], generatedOn, extr])

    rows = [header]
    for data in datas:
        row = data[:-1]
        extr = data[-1]
        for i in range(5,len(header)): # edit
            row.append(extr.get(header[i]))

        rows.append(row)

    utils.save2csv('data.csv', rows, newline='')


extract_data_from_report3()

这篇关于使用尽可能少的硬编码将XML文件解析为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 01:40