我有三列文件,第一列和第二列是时间的开始和结束,第三列是标签。如果第三列中的标签相同,我想合并连续行(2行或更多行)的时间戳。
输入1:
0.000000 0.551875 x0.551875 0.586875 x0.586875 0.676188 t0.676188 0.721875 t0.721875 0.821250 t0.821250 0.872063 p0.872063 0.968625 q0.968625 1.112250 q
输入2:
0.000000 0.551875 x0.551875 0.586875 x0.586875 0.676188 t0.676188 0.721875 t0.721875 0.821250 t0.821250 0.872063 p0.872063 0.968625 q0.968625 1.112250 q1.112250 1.212250 x1.212250 1.500000 x
输入3:
0.000000 0.551875 x0.551875 0.586875 x0.586875 0.676188 t0.676188 0.721875 t0.721875 0.821250 t0.821250 0.872063 oo0.872063 0.968625 q0.968625 1.112250 q1.112250 1.212250 x1.212250 1.500000 x
输出
0.000000 0.586875 x0.586875 0.821250 t0.821250 0.872063 p0.872063 1.112250 q1.112250 1.500000 x

最佳答案

在groovy中,给定:

def inputs = [
    [0.000000, 0.551875, 'x'],
    [0.551875, 0.586875, 'x'],
    [0.586875, 0.676188, 't'],
    [0.676188, 0.721875, 't'],
    [0.721875, 0.821250, 't'],
    [0.821250, 0.872063, 'p'],
    [0.872063, 0.968625, 'q'],
    [0.968625, 1.112250, 'q']
]

只需按每个列表中的第三个元素对它们进行分组,然后为每个组创建一个包含以下内容的列表;
第一个列表的第一项
最后一个列表中的第二项
按其分组的键
给:
def outputs = inputs.groupBy { it[2] }.collect { key, items ->
    [items[0][0], items[-1][1], key]
}

结果是:
[[0.000000, 0.586875, 'x'],
 [0.586875, 0.821250, 't'],
 [0.821250, 0.872063, 'p'],
 [0.872063, 1.112250, 'q']]

差距
如果您的输入可以有您想要保持的间隙,那么您可以尝试
def inputs = [[0.000000, 0.551875, 'x'],
              [0.551875, 0.586875, 'x'],
              [0.586875, 0.676188, 't'],
              [0.676188, 0.721875, 't'],
              [0.721875, 0.821250, 't'],
              [0.821250, 0.872063, 'p'],
              [0.872063, 0.968625, 'q'],
              [0.968625, 1.112250, 'q'],
              [1.112250, 1.551875, 'x'],
              [1.551875, 2.000000, 'x']]

def outputs = inputs.inject([]) { accum, line ->
    if(accum && accum[-1][2] == line[2]) {
        accum[-1][1] = line[1]
    }
    else {
        accum << line
    }
    accum
}

给予
[[0.000000, 0.586875, 'x'],
 [0.586875, 0.821250, 't'],
 [0.821250, 0.872063, 'p'],
 [0.872063, 1.112250, 'q'],
 [1.112250, 2.000000, 'x']]

通配符
def inputs = [[0.000000, 0.551875, 'x'],
              [0.551875, 0.586875, 'x'],
              [0.586875, 0.676188, 't'],
              [0.676188, 0.721875, 't'],
              [0.721875, 0.821250, 't'],
              [0.821250, 0.872063, 'oo'],
              [0.872063, 0.968625, 'q'],
              [0.968625, 1.112250, 'q'],
              [1.112250, 1.551875, 'x'],
              [1.551875, 2.000000, 'x']]

def coalesce(List inputs, String... wildcards) {
    inputs.inject([]) { accum, line ->
        if(accum &&
           (accum[-1][2] == line[2] || wildcards.contains(line[2]))) {
            accum[-1][1] = line[1]
        }
        else {
            accum << line
        }
        accum
    }
}

然后;
def outputs = coalesce(inputs, 'oo')

给予:
[[0.000000, 0.586875, 'x'],
 [0.586875, 0.872063, 't'],
 [0.872063, 1.112250, 'q'],
 [1.112250, 2.000000, 'x']]

以及
def outputs = coalesce(inputs, 'oo', 'q')

给予
[[0.000000, 0.586875, 'x'],
 [0.586875, 1.112250, 't'],
 [1.112250, 2.000000, 'x']]

10-01 08:13