匹配模式并使用python保存到变量

匹配模式并使用python保存到变量

本文介绍了匹配模式并使用python保存到变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 output 文件,其中包含数千行信息.我经常在输出文件中找到以下格式的信息

I have an output file containing thousands of lines of information. Every so often I find in the output file information of the following form¨

Input Orientation:
...
content
...
Distance matrix (angstroms):

我现在想将 content 保存到一个变量中,以便随后进行格式化.另一件事是,我只对文件中的 last 模式感兴趣.我有一个使用 sed awk 进行此操作的解决方案,但这使我不得不为执行一项工作而准备多个文件.这项工作应该可以用 python 完成,但我不知道从哪里开始阅读和学习.

I now want to save the content to a variable for subsequent formatting. Another thing is that I am only interested in the last pattern in my file. I have a solution for doing this with sed and awk, but that leads me to maving multiple files for carrying out one job. This job should be doable with python, but I have no idea where to start reading and to learn this.

编辑我一直在阅读正则表达式,无论是否相信我都取得了一些进步!我首先逐行读取文件,然后反转列表,然后加入组成该列表的所有字符串.我现在只剩下一个大的多行字符串.接下来,我使用 re 模块制作我的正则表达式 r'Distance matrix(.*?)Input orientation',我认为这意味着以下含义:我的第一个模式是"Distance矩阵",然后是一个匹配零个或多个字符的子模式,但以一种懒惰的方式(在第一次匹配后停止),然后是我的最后一个模式输入方向".

EDITI have been reading up on regular expressions, and believe it or not I have made some progress! I first read in the file line by line, then reverse the list, and then join all strings that make up that list. I now end up with just one big, multiline string. Next I use the re module to make my regex r'Distance matrix(.*?)Input orientation', which I think means the following: my first pattern is "Distance matrix", then a subpattern where zero or more of all characters are matched, but in a lazy way (stop after first match), and then my last pattern "Input orientation".

with open(inputfile,"r") as input_file:
        input_file_lines = input_file.readlines()
        reverse_lines = input_lines[::-1]
        string = ''.join(reverse_lines)

        match = re.search('Distance matrix(.*?)Input orientation', string, re.DOTALL).group(1)

用于测试的示例数据文件:

Sample data file for testing:

Item               Value     Threshold  Converged?
             Maximum Force            0.005032     0.000450     NO
             RMS     Force            0.001066     0.000300     NO
             Maximum Displacement     0.027438     0.001800     NO
             RMS     Displacement     0.007282     0.001200     NO
             Predicted change in Energy=-8.909077D-05
             GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGrad

                                      Input orientation:
             ---------------------------------------------------------------------
             Center     Atomic      Atomic             Coordinates (Angstroms)
             Number     Number       Type             X           Y           Z
             ---------------------------------------------------------------------
                  1          6           0        Incorrect    Incorrect    Incorrect
                  2          1           0        Incorrect    Incorrect    Incorrect
                  3          1           0        Incorrect    Incorrect    Incorrect
                  4          1           0        Incorrect    Incorrect    Incorrect
                  5         17           0        Incorrect    Incorrect    Incorrect
                  6          9           0        Incorrect    Incorrect    Incorrect
             ---------------------------------------------------------------------
                                Distance matrix (angstroms):
                                1          2          3          4          5
                 1  C    0.000000
                 2  H    1.080163   0.000000
                 3  H    1.080326   1.809416   0.000000
                 4  H    1.080621   1.810236   1.810685   0.000000
                 5  Cl   1.962171   2.470702   2.468769   2.465270   0.000000
                 6  F    2.390537   2.343910   2.357275   2.380515   4.352568
                                6
                 6  F    0.000000

                                          Input orientation:
                 ---------------------------------------------------------------------
                 Center     Atomic      Atomic             Coordinates (Angstroms)
                 Number     Number       Type             X           Y           Z
                 ---------------------------------------------------------------------
                      1          6           0        Correct    Correct     Correct
                      2          1           0        Correct    Correct     Correct
                      3          1           0        Correct    Correct     Correct
                      4          1           0        Correct    Correct     Correct
                      5         17           0        Correct    Correct     Correct
                      6          9           0        Correct    Correct     Correct
                 ---------------------------------------------------------------------
                                    Distance matrix (angstroms):
                                    1          2          3          4          5
                     1  C    0.000000
                     2  H    1.080516   0.000000
                     3  H    1.080587   1.801890   0.000000
                     4  H    1.080473   1.801427   1.801478   0.000000
                     5  Cl   1.936014   2.458132   2.459437   2.460630   0.000000
                     6  F    2.414588   2.368281   2.365651   2.355690   4.350586

推荐答案

此处不需要正则表达式.您所需要的只是良好的索引编制.Python 字符串具有 indexrindex方法接收一个子字符串,在文本中找到它,然后返回子字符串中第一个字符的索引.阅读本文档 应该会让您熟悉切片字符串.该程序可能看起来像这样:

Regex isn't necessary here. All you need is good ol' indexing. Python strings have index and rindex methods that take a substring, finds it in the text, and returns the index of the first character in the substring. Reading this doc should get you familiar with slicing strings. The program could look something like this:

with open(input_file) as f:
    s = f.read()  # reads the file as one big string

last_block = s[s.rindex('Input'):s.rindex('Distance')]

该代码的最后一行从文件的 end 开始查找第一次出现的 'Input',因为我们使用了 rindex,然后移到最前面,并将该位置标记为整数.然后,它对'Distance'执行相同的操作.然后,它使用这些整数仅返回位于它们之间的字符串部分.对于您的示例文件,它将返回:

The last line of that code finds the first occurrence of 'Input' starting from the end of the file, since we used rindex, and moving towards the front and marks that position as an integer. It then does the same with 'Distance'. It then uses those integers to return only the portion of the string that rests between them. in the case of your example file it would return:

                                      Input orientation:
             ---------------------------------------------------------------------
             Center     Atomic      Atomic             Coordinates (Angstroms)
             Number     Number       Type             X           Y           Z
             ---------------------------------------------------------------------
                  1          6           0        Correct    Correct     Correct
                  2          1           0        Correct    Correct     Correct
                  3          1           0        Correct    Correct     Correct
                  4          1           0        Correct    Correct     Correct
                  5         17           0        Correct    Correct     Correct
                  6          9           0        Correct    Correct     Correct
             ---------------------------------------------------------------------

如果您不想使用'Input orientation'标头,则只需将其添加到 rindex('Input')的结果中,直到获得所需的结果.例如,这可能看起来像 s [s.rindex('Input')+ 19:s.rindex('Distance')] .

If you don't want the 'Input orientation' header, you can simply add to the result of rindex('Input') until you get the desired result. That could look like s[s.rindex('Input') + 19:s.rindex('Distance')], for instance.

还必须注意,如果未找到子字符串,则 index rindex 会引发错误.如果不需要,可以使用 find rfind .

It is also important to note that index and rindex throw errors if the substring is not found. If that is not desired, you can use find and rfind.

这篇关于匹配模式并使用python保存到变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 05:46