解析apache日志文件

本文介绍了解析apache日志文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

来自文件的行

lockquote
172.16.0.3 - [25 / Sep / 2002：14：04：19 +0200]GET / HTTP / 1.1401 - Mozilla / 5.0（X11; U; Linux i686; zh-cn; rv：1.1）Gecko / 20020827

根据格式为

我可以打开文件并按原样读取，但是我不知道如何以这种格式读取我可以把每个部分放在一个列表中。

解决方案

这是

例如：

  l =172.16.0.3  -   -  [25 / Sep / 2002：14：04：19 +0200]GET / HTTP / 1.1401  - Mozilla / 5.0（X11; U; Linux i686; EN-US; rv：1.1）Gecko / 20020827'
 regex ='（[（\d\。）] +） -   -  \ [（。*？）\]（。*？）（ \ d +） - （。*？）（。*？）'
 
 import re 
 print re.match（regex，line）.groups（）

输出将是一个包含6行信息的元组（特别是该模式中括号内的组）：

 （'172.16.0.3'，'25 / Sep / 2002：14：04：19 +0200'，' GET / HTTP / 1.1'，'401'，''，'Mozilla / 5.0（X11; U; Linux i686; en-US; rv：1.1）Gecko / 20020827'）

I just started learning Python and would like to read an Apache log file and put parts of each line into different lists.

line from the file

according to Apache website the format is

I'm able to open the file and just read it as it is but I don't know how to make it read in that format so I can put each part in a list.

解决方案

This is a job for regular expressions.

For example:

line = '172.16.0.3 - - [25/Sep/2002:14:04:19 +0200] "GET / HTTP/1.1" 401 - "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827"'
regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'

import re
print re.match(regex, line).groups()

The output would be a tuple with 6 pieces of information from the line (specifically, the groups within parentheses in that pattern):

('172.16.0.3', '25/Sep/2002:14:04:19 +0200', 'GET / HTTP/1.1', '401', '', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827')

这篇关于解析apache日志文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！