我有一个.xls文件,有4行多列。我把它保存在制表符分隔的.txt文件中,如下所示
第一列很重要,每个字符串用,分隔。
示例数据可以在这里找到https://gist.github.com/anonymous/92a95026f9869790f209dc9ce8f55a59

A,B                   A13   This is India
AFD,DNGS,SGDH         3TR   This is how it is
NHYG,QHD,lkd,uyete    TRD   Where to go
AFD,TTT               YTR   What to do

我想每行合并一对,如果我们有多对组合,则重复其他行
这就是我要找的
A       B        A13    This is India
AFD    DNGS      3TR    This is how it is
AFD    SGDH      3TR    This is how it is
DNGS   SGDH      3TR    This is how it is
NHYG    QHD      TRD    Where to go
NHYG    lkd      TRD    Where to go
NHYG    uyete    TRD    Where to go
QHD     lkd      TRD    Where to go
QHD     uyete    TRD    Where to go
lkd     uyete    TRD    Where to go
AFD     TTT      YTR    What to do

让我们调用我的第一个数据Data
我试着逐行阅读
import itertools


lines = open("data.txt").readlines()
for line in lines:
    myrows = line.split(",")
out_list = []
for i in range(1, len(myrows)+1):
    out_list.extend(itertools.combinations(lines, i))

最佳答案

我认为您使用itertools.combinations()的想法是正确的,但是您只需要在第一列元素中运行它,而不是在整个行中运行它。
以下是我的解决方案:

import StringIO
import itertools

data = """"A,B     "    A13 This is India
"AFD,DNGS,SGDH   "  3TR This is how it is
"NHYG,QHD,lkd,uyete"    TRD Where to go
"AFD,TTT"   YTR What to do"""

for line in StringIO.StringIO(data):
    e1,e2 = line.split('\t', 1)  # extract the first part (e1) and the rest of the line (e2)
    es = e1.replace('"','').strip().split(',')  # remove extra "" and whitespace.
                                                # then split each element in a tuple
    for i in itertools.combinations(es,2):  # iterate over all combinations of 2 elements
        print '{}\t{}'.format('\t'.join(i),e2)

结果:
A   B   A13 This is India

AFD DNGS    3TR This is how it is

AFD SGDH    3TR This is how it is

DNGS    SGDH    3TR This is how it is

NHYG    QHD TRD Where to go

NHYG    lkd TRD Where to go

NHYG    uyete   TRD Where to go

QHD lkd TRD Where to go

QHD uyete   TRD Where to go

lkd uyete   TRD Where to go

AFD TTT YTR What to do

编辑
这是修改过的版本。
注意带有enumerate()f.readlines()返回当前行的索引
import itertools

with open('data.txt') as f:
    header = f.readline()
    with open('result.txt','w') as w:
        w.write(header)
        for n,line in enumerate(f.readlines()):
            elems = line.split('\t')
            e0 = elems[0].split(',')
            e0 = [e.replace('"','').strip() for e in e0]
            for pairs in itertools.combinations(e0,2):
                w.write('{:d}\t{}\t{}\n'.format(n+1,'\t'.join(pairs),'\t'.join(elems[1:])))

关于python - 如何每行成对组合,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/41207261/

10-09 18:39