我有一个.xls文件,有4行多列。我把它保存在制表符分隔的.txt文件中,如下所示
第一列很重要,每个字符串用,
分隔。
示例数据可以在这里找到https://gist.github.com/anonymous/92a95026f9869790f209dc9ce8f55a59
A,B A13 This is India
AFD,DNGS,SGDH 3TR This is how it is
NHYG,QHD,lkd,uyete TRD Where to go
AFD,TTT YTR What to do
我想每行合并一对,如果我们有多对组合,则重复其他行
这就是我要找的
A B A13 This is India
AFD DNGS 3TR This is how it is
AFD SGDH 3TR This is how it is
DNGS SGDH 3TR This is how it is
NHYG QHD TRD Where to go
NHYG lkd TRD Where to go
NHYG uyete TRD Where to go
QHD lkd TRD Where to go
QHD uyete TRD Where to go
lkd uyete TRD Where to go
AFD TTT YTR What to do
让我们调用我的第一个数据
Data
我试着逐行阅读
import itertools
lines = open("data.txt").readlines()
for line in lines:
myrows = line.split(",")
out_list = []
for i in range(1, len(myrows)+1):
out_list.extend(itertools.combinations(lines, i))
最佳答案
我认为您使用itertools.combinations()
的想法是正确的,但是您只需要在第一列元素中运行它,而不是在整个行中运行它。
以下是我的解决方案:
import StringIO
import itertools
data = """"A,B " A13 This is India
"AFD,DNGS,SGDH " 3TR This is how it is
"NHYG,QHD,lkd,uyete" TRD Where to go
"AFD,TTT" YTR What to do"""
for line in StringIO.StringIO(data):
e1,e2 = line.split('\t', 1) # extract the first part (e1) and the rest of the line (e2)
es = e1.replace('"','').strip().split(',') # remove extra "" and whitespace.
# then split each element in a tuple
for i in itertools.combinations(es,2): # iterate over all combinations of 2 elements
print '{}\t{}'.format('\t'.join(i),e2)
结果:
A B A13 This is India
AFD DNGS 3TR This is how it is
AFD SGDH 3TR This is how it is
DNGS SGDH 3TR This is how it is
NHYG QHD TRD Where to go
NHYG lkd TRD Where to go
NHYG uyete TRD Where to go
QHD lkd TRD Where to go
QHD uyete TRD Where to go
lkd uyete TRD Where to go
AFD TTT YTR What to do
编辑
这是修改过的版本。
注意带有
enumerate()
的f.readlines()
返回当前行的索引import itertools
with open('data.txt') as f:
header = f.readline()
with open('result.txt','w') as w:
w.write(header)
for n,line in enumerate(f.readlines()):
elems = line.split('\t')
e0 = elems[0].split(',')
e0 = [e.replace('"','').strip() for e in e0]
for pairs in itertools.combinations(e0,2):
w.write('{:d}\t{}\t{}\n'.format(n+1,'\t'.join(pairs),'\t'.join(elems[1:])))
关于python - 如何每行成对组合,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41207261/