我的文本文件中包含以下内容,我需要使用DataSourceName,FileName获取一个简单的csv
数据结构
<DataSourceDefinitionSet> <TABFileDataSourceDefinition id="id1" readOnly="false"> <DataSourceName>AirportLayout</DataSourceName> <FileName>\\GIS\GIS\Corporate Services\Information Services\AirportLayout.TAB</FileName> </TABFileDataSourceDefinition> <TABFileDataSourceDefinition id="id2" readOnly="false"> <DataSourceName>Asset_Toilets</DataSourceName> <FileName>\\gis\gis\CITY WORKS\Infrastructure Management\Asset_Toilets.TAB</FileName> </TABFileDataSourceDefinition> <TABFileDataSourceDefinition id="id3" readOnly="false"> <DataSourceName>BaseLayer_Text</DataSourceName> <FileName>\\GIS\GIS\Corporate Services\Information Services\BaseLayer_Text.TAB</FileName> </TABFileDataSourceDefinition>
码
import re
filename='CRC_Public_Features.mws'
input_file = open(filename)
count=0
for line in input_file:
line = line.rstrip()
if re.search('<FileName>', line) :
line=line.replace('<Filename>','')
count+=1
print str(count)+','+line
输出值
>>>
*** Remote Interpreter Reinitialized ***
>>>
1, <FileName>\\GIS\GIS\Corporate Services\Information Services\AirportLayout.TAB</FileName>
2, <FileName>\\gis\gis\CITY WORKS\Infrastructure Management\Asset_Toilets.TAB</FileName> 3,
我想要
1,AirportLayout,\ GIS \ GIS \ Corporate Services \ Information
服务\ AirportLayout.TAB
等等
我尝试以下重新,但没有结果。
'。([^] *)'
我能做什么?我需要数据源名称和文件名的两行。
=====根据接受的答案使用了最终代码
import re
filename='CRC_Public_Features.mws'
data = open(filename).read()
count=0
#for line in infile:
#data=line
values = [re.findall(first+"(.*?)"+second, data) for first, second in [("<{}>".format(b), "</{}>".format(b)) for b in ["DataSourceName","FileName"]]]
ids = [re.search("\d+", i).group(0) for i in re.findall('id="(.*?)"', data)]
final_values = [ids[0]] + [i[0] for i in values]
DataSourceName=values[0]
FileName=values[1]
total=len(FileName)
with open("Output.csv", "w") as text_file:
text_file.write("ID,DataSourceName,FileName,MWS\n")
for item in FileName:
print str(count+1)+","+str(DataSourceName[count])+","+str(FileName[count])
with open("Output.csv", "a") as text_file:
text_file.write(str(count+1)+","+str(DataSourceName[count])+","+str(FileName[count])+","+str(filename)+"\n")
count+=1
最佳答案
您可以尝试以下方法:
import re
filename='CRC_Public_Features.mws'
data = open(filename).read()
values = [re.findall(first+"(.*?)"+second, data) for first, second in [("<{}>".format(b), "</{}>".format(b)) for b in ["DataSourceName","FileName"]]]
ids = [re.search("\d+", i).group(0) for i in re.findall('id="(.*?)"', data)]
final_values = [ids[0]] + [i[0] for i in values]
输出:
['1', 'AirportLayout', '\\GIS\\GIS\\Corporate Services\\Information Services\\AirportLayout.TAB']