所以我有两个数据列表,看起来像这样(缩短):

[[1.0, 1403603100],
 [0.0, 1403603400],
 [2.0, 1403603700],
 [0.0, 1403604000],
 [None, 1403604300]]

[1.0, 1403603100],
[0.0, 1403603400],
[1.0, 1403603700],
[None, 1403604000],
[5.0, 1403604300]]

我想要做的是合并它们,对每个数据集的第一个元素求和,或者如果任一计数器值为 None,则将其设为 0.0。所以上面的例子会变成这样:
[[2.0, 1403603100],
[0.0, 1403603400],
[3.0, 1403603700],
[0.0, 1403604000],
[0.0, 1403604300]]

到目前为止,这是我想出的,如果它有点笨拙,请道歉。
def emit_datum(datapoints):
    for datum in datapoints:
        yield datum

def merge_data(data_set1, data_set2):

    assert len(data_set1) == len(data_set2)
    data_length = len(data_set1)

    data_gen1 = emit_datum(data_set1)
    data_gen2 = emit_datum(data_set2)

    merged_data = []

    for _ in range(data_length):

        datum1 = data_gen1.next()
        datum2 = data_gen2.next()

        if datum1[0] is None or datum2[0] is None:
            merged_data.append([0.0, datum1[1]])
            continue

        count = datum1[0] + datum2[0]
        merged_data.append([count, datum1[1]])

    return merged_data

我只能希望/假设我可以用 itertools 或集合做一些狡猾的事情?

最佳答案

如果您使两个值都等于 0.0,并且其中一个都为 None,则您只需要一个简单的循环。

 l1 = [1.0, 1403603100],
 [0.0, 1403603400],
 [2.0, 1403603700],
 [0.0, 1403604000],
 [None, 1403604300]]

l2 = [[1.0, 1403603100],
[0.0, 1403603400],
[1.0, 1403603700],
[None, 1403604000],
[5.0, 1403604300]]

final = []
assert len(l1)== len(l2)
for x, y in zip(l1, l2):
    if x[0] is  None or y[0] is  None:
        y[0] = 0.0
        final.append(y)
    else:
        final.append([x[0] + y[0], x[-1]])
print final

[[2.0, 1403603100], [0.0, 1403603400], [3.0, 1403603700], [0.0, 1403604000], [0.0, 1403604300]]


In [51]: %timeit merge_data(l1,l2)
100000 loops, best of 3: 5.76 µs per loop


 In [52]: %%timeit
   ....: final = []
   ....: assert len(l1)==len(l2)
   ....: for x, y in zip(l1, l2):
   ....:     if x[0] is  None or y[0] is None:
   ....:         y[0] = 0.0
   ....:         final.append(y)
   ....:     else:
   ....:         final.append([x[0] + y[0], x[-1]])
   ....:
100000 loops, best of 3: 2.64 µs per loop

关于python - 如何有效地合并这两个数据集?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/24428048/

10-15 16:14