我想按文件中的前两个词对文件进行分组(然后重新排列并打印)
我想做
lines=file.readlines()
i=0
for line in lines:
word1=line.split()[0]
word2=line.split()[1]
if word1==lines[i+1].split()[0] and word1==lines[i-1].split()[0] :
if word2=lines[i-1].split()[1] and word2==lines[i--1].split()[0]:
print line
else:
print "***new block of lines \n***"
但是,这是一个非常差的解决方案,因为它不适用于第一行或最后一行,并且总体上不能很好地工作。更好的解决方案表示赞赏
最佳答案
如果要对共享文件中前两个单词的连续行进行分组,这是itertools.groupby
的用例,例如:
from itertools import groupby
with open('somefile') as fin:
lines = ((line.split(None, 2)[:2], line) for line in fin if line.strip())
for k, g in groupby(lines, lambda L: L[0]):
lines = [el[1] for el in g]
在这里,
k
是分组密钥(最多前两个单词),而lines
将是文件中共享该密钥的行。示例
somefile
输入:one two three four five
one two five six seven
three four something
three four something else
one two start of new one two block
print k, lines
的结果:['one', 'two'] ['one two three four five\n', 'one two five six seven\n']
['three', 'four'] ['three four something\n', 'three four something else\n']
['one', 'two'] ['one two start of new one two block\n']
要从
line
中排除前两个单词,请使用:with open('somefile') as fin:
lines = (line.split(None, 2) for line in fin if line.strip())
for k, g in groupby(lines, lambda L: L[:2]):
lines = [el[2] for el in g]
关于python - Python-通过前两个单词对行进行分组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/28759687/