本文介绍了解析apache日志文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
来自文件的行
lockquote
172.16.0.3 - [25 / Sep / 2002:14:04:19 +0200]GET / HTTP / 1.1401 - Mozilla / 5.0(X11; U; Linux i686; zh-cn; rv:1.1)Gecko / 20020827
根据格式为
我可以打开文件并按原样读取,但是我不知道如何以这种格式读取我可以把每个部分放在一个列表中。
解决方案
l =172.16.0.3 - - [25 / Sep / 2002:14:04:19 +0200]GET / HTTP / 1.1401 - Mozilla / 5.0(X11; U; Linux i686; EN-US; rv:1.1)Gecko / 20020827'
regex ='([(\d\。)] +) - - \ [(。*?)\](。*?)( \ d +) - (。*?)(。*?)'
import re
print re.match(regex,line).groups()
输出将是一个包含6行信息的元组(特别是该模式中括号内的组):
('172.16.0.3','25 / Sep / 2002:14:04:19 +0200',' GET / HTTP / 1.1','401','','Mozilla / 5.0(X11; U; Linux i686; en-US; rv:1.1)Gecko / 20020827')
according to Apache website the format is
I'm able to open the file and just read it as it is but I don't know how to make it read in that format so I can put each part in a list.
解决方案
This is a job for regular expressions.
For example:
line = '172.16.0.3 - - [25/Sep/2002:14:04:19 +0200] "GET / HTTP/1.1" 401 - "" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827"'
regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'
import re
print re.match(regex, line).groups()
The output would be a tuple with 6 pieces of information from the line (specifically, the groups within parentheses in that pattern):
('172.16.0.3', '25/Sep/2002:14:04:19 +0200', 'GET / HTTP/1.1', '401', '', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827')
这篇关于解析apache日志文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!