Closed. This question needs to be more focused. It is not currently accepting answers. Learn more
想改进这个问题吗?更新问题,使其只关注一个问题editing this post
我有以下格式的数据。

group_name  group_item_fetch
topic_name  fast_events_breaking
topic_lag  0
topic_name  item_fetch_prod_stage
topic_lag  0
topic_name  related_item_re
topic_lag  1018713
group_name fast_processing_events
topic_name item_fetch_processed
topic_lag 109323

如何获取以下格式的输出文件?
group_name,topic_name,topic_lag
group_item_fetch,fast_events_breaking,0
"",item_fetch_prod_stage,0
"",related_item_re,1018713
fast_processing_events,item_fetch_processed,109323

最佳答案

使用python2.7.12,使用Ubuntu 16.04,我编写了这段代码,将文件作为输入,打印结果并将其保存在out.txt文件中:

import sys

intial_values = []
output = []
file = open('out.txt','w')

print 'group_name,topic_name,topic_lag'
file.write('group_name,topic_name,topic_lag\n')

for line in sys.stdin:
  intial_values.append(line.split())

is_previous_group = bool
for index, value in enumerate(intial_values):
  if value[0] == 'group_name':
    output.append([
      value[1],
      intial_values[index + 1][1],
      intial_values[index + 2][1]
    ])
    is_previous_group = True
  elif value[0] == 'topic_name':
    if is_previous_group != True:
      output.append([
        '""',
        value[1],
        intial_values[index + 1][1]
      ])
    is_previous_group = False

for value in output:
  print ','.join(value)
  file.write(','.join(value) + '\n')

我将输入放在一个名为in.txt的文件中,例如:
group_name  group_item_fetch
topic_name  fast_events_breaking
topic_lag  0
topic_name  item_fetch_prod_stage
topic_lag  0
topic_name  related_item_re
topic_lag  1018713
group_name fast_processing_events
topic_name item_fetch_processed
topic_lag 109323

在终端中使用cat和pipe(我将python代码称为“filter_rows.py”):
cat in.txt | python filter_rows.py

结果正如您所要求的:
group_name,topic_name,topic_lag
group_item_fetch,fast_events_breaking,0
"",item_fetch_prod_stage,0
"",related_item_re,1018713
fast_processing_events,item_fetch_processed,109323

完成!;)

10-06 00:54