从日志文件python创建csv标头

从日志文件python创建csv标头

本文介绍了从日志文件python创建csv标头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的日志文件的每一行都包含一些信息,如下所示

My log file contains some info in every row like below

Info1:NewOrder|key:123 |Info3:10|Info5:abc
Info3:10|Info1:OldOrder| key:456| Info6:xyz
Info1:NewOrder|key:007

我想将其更改为如下所示的csv(如果我将key,Info1,Info3作为必需的标头)

I want to change it to a csv like below (if i give key,Info1,Info3 as required headers)

key,Info1.Info3
123,NewOrder,10
456,OldOrder,10
007,NewOrder,

我以前使用awk来获取字段值,但是日志记录可以更改信息和连续打印的键的顺序.因此,我无法确定Info3始终会位于某个特定的列中.每次更改日志记录时,都需要更改脚本.

Earlier I used awk to get field values, but logging can change the order of info and key printed in a row. So I cannot be sure that Info3 would always be in some particular column. Everytime,logging changes, the script needed to be changed.

然后我打算将csv加载到pandas数据框中.因此,python解决方案会更好.这更多是从日志文件生成csv的数据清理任务.

I intend then to load csv in pandas dataframe. So a python solution would be better. This is more of a data cleaning task to generate a csv from logfile.

这是我在阅读答案后所使用的

This is what I have used after reading the answers

import csv
import sys
with open(sys.argv[1], 'r') as myLogfile:
        log=myLogfile.read().replace('\n', '')

requested_columns = ["OrderID", "TimeStamp", "ErrorCode"]

def wrangle(string, requested_columns):
        data = [dict([element.strip().split(":") for element in row.split("|")]) for row in string.split("\n")]
        body = [[row.get(column) for column in requested_columns] for row in data]
        return [requested_columns] + body

outpath = sys.argv[2]
open(outpath, "w", newline = "") with open(outpath, 'wb')
        writer = csv.writer(file)
        writer.writerows(wrangle(log, requested_columns))

示例日志文件= https://ideone.com/cny805

推荐答案

您可以使用带有|分隔符的csv阅读器来开始使用,然后使用:进行拆分,为您提供每行字典,如下所示:

You could use a csv reader with a | delimiter to get you started, then split using : to give you a per row dictionary as follows:

import csv

with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_output = csv.writer(f_output)
    cols = ["OrderID", "TimeStamp", "ErrorCode"]
    csv_output.writerow(cols)

    for row in csv.reader(f_input, delimiter='|'):
        # Remove any entries that do not have a colon
        row = [c for c in row if c.find(':') != -1]
        # Convert remaining columns into a dictionary
        entries = {c.split(':')[0].strip() : c.split(':')[1].strip() for c in row}
        csv_output.writerow([entries.get(c, "") for c in cols])

为您提供输出文件:

OrderID,TimeStamp,ErrorCode
3000000,1488948188555841641,
3000000,1488948188556444675,0


要将数据直接读取到Pandas数据框中:


To read the data directly into a Pandas dataframe:

import pandas as pd
import csv

cols = ["OrderID", "TimeStamp", "ErrorCode"]
data = []

with open('input.csv', 'rb') as f_input:
    csv_output = csv.writer(f_output)

    for row in csv.reader(f_input, delimiter='|'):
        # Remove any entries that do not have a colon
        row = [c for c in row if c.find(':') != -1]
        # Convert remaining columns into a dictionary
        entries = {c.split(':')[0].strip() : c.split(':')[1].strip() for c in row}
        data.append([entries.get(c, "") for c in cols])

df = pd.DataFrame(data, columns=cols)
print df

给你

   OrderID            TimeStamp ErrorCode
0  3000000  1488948188555841641
1  3000000  1488948188556444675         0

这篇关于从日志文件python创建csv标头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 04:25