Closed. This question needs to be more focused. It is not currently accepting answers. Learn more。
想改进这个问题吗?更新问题,使其只关注一个问题editing this post。
我有以下格式的数据。
group_name group_item_fetch
topic_name fast_events_breaking
topic_lag 0
topic_name item_fetch_prod_stage
topic_lag 0
topic_name related_item_re
topic_lag 1018713
group_name fast_processing_events
topic_name item_fetch_processed
topic_lag 109323
如何获取以下格式的输出文件?
group_name,topic_name,topic_lag
group_item_fetch,fast_events_breaking,0
"",item_fetch_prod_stage,0
"",related_item_re,1018713
fast_processing_events,item_fetch_processed,109323
最佳答案
使用python2.7.12
,使用Ubuntu 16.04
,我编写了这段代码,将文件作为输入,打印结果并将其保存在out.txt文件中:
import sys
intial_values = []
output = []
file = open('out.txt','w')
print 'group_name,topic_name,topic_lag'
file.write('group_name,topic_name,topic_lag\n')
for line in sys.stdin:
intial_values.append(line.split())
is_previous_group = bool
for index, value in enumerate(intial_values):
if value[0] == 'group_name':
output.append([
value[1],
intial_values[index + 1][1],
intial_values[index + 2][1]
])
is_previous_group = True
elif value[0] == 'topic_name':
if is_previous_group != True:
output.append([
'""',
value[1],
intial_values[index + 1][1]
])
is_previous_group = False
for value in output:
print ','.join(value)
file.write(','.join(value) + '\n')
我将输入放在一个名为
in.txt
的文件中,例如:group_name group_item_fetch
topic_name fast_events_breaking
topic_lag 0
topic_name item_fetch_prod_stage
topic_lag 0
topic_name related_item_re
topic_lag 1018713
group_name fast_processing_events
topic_name item_fetch_processed
topic_lag 109323
在终端中使用cat和pipe(我将python代码称为“filter_rows.py”):
cat in.txt | python filter_rows.py
结果正如您所要求的:
group_name,topic_name,topic_lag
group_item_fetch,fast_events_breaking,0
"",item_fetch_prod_stage,0
"",related_item_re,1018713
fast_processing_events,item_fetch_processed,109323
完成!;)
10-06 00:54