我想按文件中的前两个词对文件进行分组(然后重新排列并打印)

我想做

   lines=file.readlines()
   i=0
   for line in lines:
    word1=line.split()[0]
    word2=line.split()[1]
    if word1==lines[i+1].split()[0] and word1==lines[i-1].split()[0] :
        if word2=lines[i-1].split()[1] and word2==lines[i--1].split()[0]:
              print line
    else:
       print "***new block of lines \n***"


但是,这是一个非常差的解决方案,因为它不适用于第一行或最后一行,并且总体上不能很好地工作。更好的解决方案表示赞赏

最佳答案

如果要对共享文件中前两个单词的连续行进行分组,这是itertools.groupby的用例,例如:

from itertools import groupby

with open('somefile') as fin:
    lines = ((line.split(None, 2)[:2], line) for line in fin if line.strip())
    for k, g in groupby(lines, lambda L: L[0]):
        lines = [el[1] for el in g]


在这里,k是分组密钥(最多前两个单词),而lines将是文件中共享该密钥的行。

示例somefile输入:

one two three four five
one two five six seven
three four something
three four something else
one two start of new one two block


print k, lines的结果:

['one', 'two'] ['one two three four five\n', 'one two five six seven\n']
['three', 'four'] ['three four something\n', 'three four something else\n']
['one', 'two'] ['one two start of new one two block\n']


要从line中排除前两个单词,请使用:

with open('somefile') as fin:
    lines = (line.split(None, 2) for line in fin if line.strip())
    for k, g in groupby(lines, lambda L: L[:2]):
        lines = [el[2] for el in g]

关于python - Python-通过前两个单词对行进行分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28759687/

10-15 17:34