解析目录和所有子目录中的所有XML文件 | 解析目录和所有子目录中的所有XML文件

本文介绍了解析目录和所有子目录中的所有XML文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我很熟悉Python，但我对Delphi有一些经验。
我试图制作一个脚本，可以搜索目录中的所有xml文件（包括该目录中的所有子目录），然后解析这些XML并将一些数据（数字）从那里保存到一个简单的txt文件。之后，我通过该txt文件创建另一个txt文件，只有先前创建的txt文件中唯一的一组数字。

我创建了这个脚本：

  import os 
 from xml.dom import minidom 
 
＃用于测试目的
 directory = os .getcwd（）
 
 print（Procházímaktuálníadresář，hledámXML soubory ...）
 print（ProcházímXML soubory，hledámIČPprovádějícího...）
 
 with open（'ICP_all.txt'，'w'）as SeznamICP_all：
 for root，dirs，os.walk（directory）中的文件：文件中
 
 if（file.endswith（'。xml'））：
 xmldoc = minidom.parse（file）
 itemlist = xmldoc.getElementsByTagName（'is'）
 SeznamICP_all.write（itemlist [ 0] .attributes ['icp']。value +'\\\
'）
 
 print（VytvářímlistunikátníchIČP...）
 
 with open（' ICP_distinct.txt'，'w'）作为disti nct：
 UnikatniICP = [] 
 with open（'ICP_all.txt'，'r'）as SeznamICP_all：
在SeznamICP_all中的行：
如果行不在UnikatniICP：
 UnikatniICP.append（line）
 distinct.write（line）
 
 print（'PočetunikátníchIČP：'+ str（len（UnikatniICP）））
 input 'Proukončenístiskni libovolnouklávesu...'）

它的作用是直到有一个子目录在这种情况下，我得到错误：

  FileNotFoundError：[Errno 2]没有这样的文件或目录：'RNN38987.xml'

这是由于文件在子目录中而不是在具有python脚本的目录中。我试图通过路径使其工作，以获得文件的绝对路径，但是我收到更多错误，请看脚本：

  import os 
 from xml.dom import minidom 
 from pathlib import Path 
 
＃用于测试目的
目录= os.getcwd（）
 
 print（Procházímaktuálníadresář，hledámXML soubory ...）
 print（ProcházímXML soubory，hledámIČPprovádějícího...）
 
 with open（ 'ICP all all'）all all all all all all：irs irs irs irs（（（（（（（（（（（（（（（（（（（：：：：：：：：：：：：：： '.xml'））
 soubor = Path（file）.resolve（）
 print（soubor）
 xmldoc = minidom.parse（soubor）
 itemlist = xmldoc.getElementsByTagName （'is'）
 SeznamICP_all.write（itemlist [0] .attributes ['icp']。value +'\\\
'）
 
 print（VytvářímlistunikátníchIČ P $
 
 with open（'ICP_distinct.txt'，'w'）as distinct：
 UnikatniICP = [] 
 with open（'ICP_all.txt' ，'r'）as SeznamICP_all：
在SeznamICP_all中的行：
如果行不在UnikatniICP中：
 UnikatniICP.append（行）
 distinct.write（line）
 
 print（'PočetunikátníchIČP：'+ str（len（UnikatniICP）））
 input（'Proukončenístiskni libovolnouklávesu...'）

我现在得到的错误我真的不明白，谷歌没有帮助 - 整个日志：

 Procházímaktuálníadresář，hledámXML soubory ... 
ProcházímXML soubory，hledámIČPprovádějícího... 
 C：\2_Programming\Python \IČPFINDER\src\20150225_1815_2561_1.xml 
追溯（最近的最后一次呼叫）：
文件C：\2_Programming\Python\IČPFINDER\src\ICP Finder。 py，第17行，in <模块> 
 xmldoc = minidom.parse（soubor）
文件C：\2_Programming\Python\Interpreter\lib\xml\dom\minidom.py，第1958行，解析
返回expatbuilder.parse（文件）
文件C：\2_Programming\Python\Interpreter\lib\xml\dom\expatbuilder.py，第913行，解析
 result = builder.parseFile（file）
文件C：\2_Programming\Python\Interpreter\lib\xml\dom\expatbuilder.py，第204行，parseFile 
 buffer = file.read（16 * 1024）
 AttributeError：'WindowsPath'对象没有属性'read'

你能帮我吗？

解决方案

你想要的模式是： / p>

  with open（'ICP_all.txt'，'w'）as SeznamICP_all：
 for root，dirs，files in os.walk（目录）：
文件中的文件：
 if（file.endswith（'。 xml'））
 xmldoc = minidom.parse（os.path.join（root，file））
 itemlist = xmldoc.getElementsByTagName（'is'）
 SeznamICP_all.write（itemlist [ 0] .attributes ['icp']。value +'\\\
'）

在每次迭代的为循环，根指的是文件和 dirs 存在。

I am new to Python, yet I have some experience with Delphi.I am trying to make a script that would be able to search all xml files in directory (including all subdirectories in that directory), then parse those XML and save some data (numbers) from there to a simple txt file. After that I work through that txt file to create another txt file with only unique set of numbers from previously created txt file.

I created this script:

import os
from xml.dom import minidom

#for testing purposes
directory = os.getcwd()

print("Procházím aktuální adresář, hledám XML soubory...")
print("Procházím XML soubory, hledám IČP provádějícího...")

with open ('ICP_all.txt', 'w') as SeznamICP_all:
    for root, dirs, files in os.walk(directory):
        for file in files:
            if (file.endswith('.xml')):
                xmldoc = minidom.parse(file)
                itemlist = xmldoc.getElementsByTagName('is')
                SeznamICP_all.write(itemlist[0].attributes['icp'].value + '\n')

print("Vytvářím list unikátních IČP...")

with open ('ICP_distinct.txt','w') as distinct:
    UnikatniICP = []
    with open ('ICP_all.txt','r') as SeznamICP_all:
        for line in SeznamICP_all:
            if line not in UnikatniICP:
                UnikatniICP.append(line)
                distinct.write(line)

print('Počet unikátních IČP:' + str(len(UnikatniICP)))
input('Pro ukončení stiskni libovolnou klávesu...')

It works as intented just until there is a subdirectory, in that case I get error:

FileNotFoundError: [Errno 2] No such file or directory: 'RNN38987.xml'

That is caused by the fact that file is in subdirectory, not in a directory with python script. I tried to make it work via path to get absolute path of the file to work with, but I am getting more error, see the script:

import os
from xml.dom import minidom
from pathlib import Path

#for testing purposes
directory = os.getcwd()

print("Procházím aktuální adresář, hledám XML soubory...")
print("Procházím XML soubory, hledám IČP provádějícího...")

with open ('ICP_all.txt', 'w') as SeznamICP_all:
    for root, dirs, files in os.walk(directory):
        for file in files:
            if (file.endswith('.xml')):
                soubor = Path(file).resolve()
                print(soubor)
                xmldoc = minidom.parse(soubor)
                itemlist = xmldoc.getElementsByTagName('is')
                SeznamICP_all.write(itemlist[0].attributes['icp'].value + '\n')

print("Vytvářím list unikátních IČP...")

with open ('ICP_distinct.txt','w') as distinct:
    UnikatniICP = []
    with open ('ICP_all.txt','r') as SeznamICP_all:
        for line in SeznamICP_all:
            if line not in UnikatniICP:
                UnikatniICP.append(line)
                distinct.write(line)

print('Počet unikátních IČP:' + str(len(UnikatniICP)))
input('Pro ukončení stiskni libovolnou klávesu...')

The error I am getting now I don't really understand and google is not helping either - whole log:

Procházím aktuální adresář, hledám XML soubory...
Procházím XML soubory, hledám IČP provádějícího...
C:\2_Programming\Python\IČP FINDER\src\20150225_1815_2561_1.xml
Traceback (most recent call last):
  File "C:\2_Programming\Python\IČP FINDER\src\ICP Finder.py", line 17, in <module>
    xmldoc = minidom.parse(soubor)
  File "C:\2_Programming\Python\Interpreter\lib\xml\dom\minidom.py", line 1958, in parse
    return expatbuilder.parse(file)
  File "C:\2_Programming\Python\Interpreter\lib\xml\dom\expatbuilder.py", line 913, in parse
    result = builder.parseFile(file)
  File "C:\2_Programming\Python\Interpreter\lib\xml\dom\expatbuilder.py", line 204, in parseFile
    buffer = file.read(16*1024)
AttributeError: 'WindowsPath' object has no attribute 'read'

Can you please help me out?

解决方案

The pattern you are looking for is like:

with open ('ICP_all.txt', 'w') as SeznamICP_all:
    for root, dirs, files in os.walk(directory):
        for file in files:
            if (file.endswith('.xml')):
                xmldoc = minidom.parse(os.path.join(root, file))
                itemlist = xmldoc.getElementsByTagName('is')
                SeznamICP_all.write(itemlist[0].attributes['icp'].value + '\n')

In each iteration of your for loop, root refers to the directory in which the files and dirs exist.

这篇关于解析目录和所有子目录中的所有XML文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！