问题描述
尊敬的同事们,我有一个原始数据格式,如下所述,主要是三行,每行都以模式 dn:
开头,后面是 ftpuser
和description
,而存在第三行 description
缺失的情况,因此在这种情况下前两行完整.现在,我使用多行正则表达式来匹配所有这些模式,并使用它从我的 data
变量中获取数据,并将其传递给正则表达式(re.findall),此外,我有循环 matchObj
以获取索引形式的值,因此我只能从 new_str
列表中获得所需的索引.
以下是数据文件:
dn: uid=ac002,ou=ftpusers,ou=applications,o=regg.comftpuser: 是描述:文件传输|12/31/2010|文件传输dn: uid=ab02,ou=ftpusers,ou=applications,o=regg.comftpuser: disabled_5Mar07描述:Remedy Tkt 01239399 雷格移动dn: uid=mela,ou=ftpusers,ou=applications,o=regg.comftpuser: 是描述:ROYALS|无|客户账户dn: uid=aa01,ou=ftpusers,ou=applications,o=regg.com用户:Tdn: uid=aa02,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=aa03,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=bb01,ou=ftpusers,ou=applications,o=regg.com用户:Tdn: uid=bb02,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=bb03,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=bb05,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=ab01,ou=ftpusers,ou=applications,o=regg.comftpuser: 是说明:: VGVzdGluZyA=dn: [email protected],ou=ftpusers,ou=applications,o=regg.com用户:T描述:REG-JP|7-31-05|REG-JP
下面是我试过的代码,但这里的问题是,这段代码只选择了获得所有三行的数据(dn:
,ftpuser
,description
) 和只有两行的行 ((dn:
,ftpuser
) 它无法检索那些因此我想知道,我们如何将这些行也放入类似的输出中制作/附加 Description: null
任何缺失的地方
#!/usr/bin/python3# ./dataparse.py从 __future__ 导入 print_function从信号导入信号,SIGPIPE,SIG_DFL信号(SIGPIPE,SIG_DFL)进口重新with open('test2', 'r') as f:对于 f 中的行:line = line.strip()数据 = f.read()正则表达式 = (r"dn:(.*?)\nftpuser: (.*)\ndescription:* (.*)")matchObj = re.findall(regex, data)对于 matchObj 中的索引:#打印(索引)index_str = ' '.join(index)new_str = re.sub(r'[=,]', ' ', index_str)new_str = new_str.split()打印("{0:<30}{1:<20}{2:<50}".format(new_str[1],new_str[8],new_str[9]))
结果输出:
$ ./dataparse.pyab02 disabled_5Mar07 补救措施mela Y ROYALS|无|客户ab01 Y [email protected] T REG-JP|7-31-05|REG-JP
作为一名 Python 初学者,我将不胜感激任何帮助或建议.
只需在正则表达式模式中将描述设为可选即可.改为:
r"dn:(.*?)\nftpuser: (.*)\n(?:description:* (.*))?"
Esteemed colleagues, I have a raw data format as i detailed below where primarily it has to be three line and every line is starts with pattern dn:
following ftpuser
and description
, whereas there are situations where the third line description
is missing hence First two lines are intacted in this case. Now, I'm using a multiline regex to match all these patterns and using it to get the data from my data
variable and this is passed to the regex(re.findall), Further, i have for looped the matchObj
to get the values in a index form so i have can only the desired indexes from new_str
List.
dn: uid=ac002,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description: file transfer|12/31/2010|file transfer
dn: uid=ab02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: disabled_5Mar07
description: Remedy Tkt 01239399 regg move
dn: uid=mela,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description: ROYALS|none|customer account
dn: uid=aa01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
dn: uid=aa02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=aa03,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=bb01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
dn: uid=bb02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=bb03,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=bb05,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
dn: uid=ab01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description:: VGVzdGluZyA=
dn: [email protected],ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
description: REG-JP|7-31-05|REG-JP
Below is the code which i tried, but the problem here is as , this code only picks the data where it gets all three lines (dn:
,ftpuser
,description
) and line where it has only two lines ((dn:
,ftpuser
) it fails to retrieve those hence i would like to know , how we can get those line also into the similar output making/appending Description: null
wherever its missing
#!/usr/bin/python3
# ./dataparse.py
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import re
with open('test2', 'r') as f:
for line in f:
line = line.strip()
data = f.read()
regex = (r"dn:(.*?)\nftpuser: (.*)\ndescription:* (.*)")
matchObj = re.findall(regex, data)
for index in matchObj:
#print(index)
index_str = ' '.join(index)
new_str = re.sub(r'[=,]', ' ', index_str)
new_str = new_str.split()
print("{0:<30}{1:<20}{2:<50}".format(new_str[1],new_str[8],new_str[9]))
$ ./dataparse.py
ab02 disabled_5Mar07 Remedy
mela Y ROYALS|none|customer
ab01 Y VGVzdGluZyA
[email protected] T REG-JP|7-31-05|REG-JP
As a python beginner i would appreciate any help or suggestion.
Simply make description optional in your regex pattern. Change it to:
r"dn:(.*?)\nftpuser: (.*)\n(?:description:* (.*))?"
这篇关于从文本文件中搜索模式,如果模式丢失,则放置一个值 Null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!