从文本文件中搜索模式，如果模式丢失，则放置一个值 Null

本文介绍了从文本文件中搜索模式，如果模式丢失，则放置一个值 Null的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尊敬的同事们，我有一个原始数据格式，如下所述，主要是三行，每行都以模式 dn: 开头，后面是 ftpuser 和description，而存在第三行 description 缺失的情况，因此在这种情况下前两行完整.现在，我使用多行正则表达式来匹配所有这些模式，并使用它从我的 data 变量中获取数据，并将其传递给正则表达式(re.findall)，此外，我有循环 matchObj 以获取索引形式的值，因此我只能从 new_str 列表中获得所需的索引.

以下是数据文件:

dn: uid=ac002,ou=ftpusers,ou=applications,o=regg.comftpuser: 是描述:文件传输|12/31/2010|文件传输dn: uid=ab02,ou=ftpusers,ou=applications,o=regg.comftpuser: disabled_5Mar07描述:Remedy Tkt 01239399 雷格移动dn: uid=mela,ou=ftpusers,ou=applications,o=regg.comftpuser: 是描述:ROYALS|无|客户账户dn: uid=aa01,ou=ftpusers,ou=applications,o=regg.com用户:Tdn: uid=aa02,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=aa03,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=bb01,ou=ftpusers,ou=applications,o=regg.com用户:Tdn: uid=bb02,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=bb03,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=bb05,ou=ftpusers,ou=applications,o=regg.comftpuser: 是dn: uid=ab01,ou=ftpusers,ou=applications,o=regg.comftpuser: 是说明:: VGVzdGluZyA=dn: [email protected],ou=ftpusers,ou=applications,o=regg.com用户:T描述:REG-JP|7-31-05|REG-JP

下面是我试过的代码，但这里的问题是，这段代码只选择了获得所有三行的数据(dn:,ftpuser,description) 和只有两行的行 ((dn:,ftpuser) 它无法检索那些因此我想知道，我们如何将这些行也放入类似的输出中制作/附加 Description: null 任何缺失的地方

#!/usr/bin/python3# ./dataparse.py从 __future__ 导入 print_function从信号导入信号，SIGPIPE，SIG_DFL信号(SIGPIPE，SIG_DFL)进口重新with open('test2', 'r') as f:对于 f 中的行:line = line.strip()数据 = f.read()正则表达式 = (r"dn:(.*?)\nftpuser: (.*)\ndescription:* (.*)")matchObj = re.findall(regex, data)对于 matchObj 中的索引:#打印(索引)index_str = ' '.join(index)new_str = re.sub(r'[=,]', ' ', index_str)new_str = new_str.split()打印("{0:<30}{1:<20}{2:<50}".format(new_str[1],new_str[8],new_str[9]))

结果输出:

$ ./dataparse.pyab02 disabled_5Mar07 补救措施mela Y ROYALS|无|客户ab01 Y [email protected] T REG-JP|7-31-05|REG-JP

作为一名 Python 初学者，我将不胜感激任何帮助或建议.

解决方案

只需在正则表达式模式中将描述设为可选即可.改为:

r"dn:(.*?)\nftpuser: (.*)\n(?:description:* (.*))?"

Esteemed colleagues, I have a raw data format as i detailed below where primarily it has to be three line and every line is starts with pattern dn: following ftpuser and description, whereas there are situations where the third line description is missing hence First two lines are intacted in this case. Now, I'm using a multiline regex to match all these patterns and using it to get the data from my data variable and this is passed to the regex(re.findall), Further, i have for looped the matchObj to get the values in a index form so i have can only the desired indexes from new_str List.

dn: uid=ac002,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description: file transfer|12/31/2010|file transfer

dn: uid=ab02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: disabled_5Mar07
description: Remedy Tkt 01239399 regg move

dn: uid=mela,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description: ROYALS|none|customer account

dn: uid=aa01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T

dn: uid=aa02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y

dn: uid=aa03,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y

dn: uid=bb01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: T

dn: uid=bb02,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y

dn: uid=bb03,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y

dn: uid=bb05,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y

dn: uid=ab01,ou=ftpusers,ou=applications,o=regg.com
ftpuser: Y
description:: VGVzdGluZyA=

dn: [email protected],ou=ftpusers,ou=applications,o=regg.com
ftpuser: T
description: REG-JP|7-31-05|REG-JP

Below is the code which i tried, but the problem here is as , this code only picks the data where it gets all three lines (dn:,ftpuser,description) and line where it has only two lines ((dn:,ftpuser) it fails to retrieve those hence i would like to know , how we can get those line also into the similar output making/appending Description: null wherever its missing

#!/usr/bin/python3
# ./dataparse.py
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
import re
with open('test2', 'r') as f:
    for line in f:
        line = line.strip()
        data = f.read()
        regex = (r"dn:(.*?)\nftpuser: (.*)\ndescription:* (.*)")
        matchObj = re.findall(regex, data)
        for index in matchObj:
            #print(index)
            index_str = ' '.join(index)
            new_str = re.sub(r'[=,]', ' ', index_str)
            new_str = new_str.split()
            print("{0:<30}{1:<20}{2:<50}".format(new_str[1],new_str[8],new_str[9]))

$ ./dataparse.py
ab02                          disabled_5Mar07     Remedy
mela                          Y                   ROYALS|none|customer
ab01                          Y                   VGVzdGluZyA
[email protected]                   T                   REG-JP|7-31-05|REG-JP

As a python beginner i would appreciate any help or suggestion.

解决方案

Simply make description optional in your regex pattern. Change it to:

r"dn:(.*?)\nftpuser: (.*)\n(?:description:* (.*))?"

这篇关于从文本文件中搜索模式，如果模式丢失，则放置一个值 Null的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！