AWK说明
简单来说awk就是把文件逐行的读入,以指定的分隔符将每行分割,分割后的部分再进行各种分析处理。
先看下AWK的命令的说明
AWK使用
看下网站access.log。
tail -f /home/wwwlogs/access.log
148.70.179.32 - - [15/Nov/2019:05:46:28 +0800] "POST /wp-cron.php?doing_wp_cron=1573767987.5338680744171142578125 HTTP/1.1" 200 31 "http://www.test.com.cn/wp-cron.php?doing_wp_cron=1573767987.5338680744171142578125" "WordPress/5.0.7; http://www.test.com.cn"
220.181.108.143 - - [15/Nov/2019:05:46:28 +0800] "GET / HTTP/1.1" 200 5596 "-" "Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
111.206.198.18 - - [15/Nov/2019:05:46:28 +0800] "GET /wp-includes/css/dist/block-library/style.min.css?ver=5.0.7 HTTP/1.1" 200 25658 "http://www.test.com.cn/" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)"
打印下访问日志的IP列表
awk -F" " '{print $1}' /home/wwwlogs/access.log
148.70.179.32
91.228.8.210
1.119.148.54
121.51.40.28
1.119.148.54
1.119.148.54
给IP加上文件名行列号
awk -F" " '{print FILENAME"|"NR"|"NF"|"$0}' /home/wwwlogs/access.log
/home/wwwlogs/access.log|9979|12|150.109.77.71
/home/wwwlogs/access.log|9980|20|150.109.77.71
/home/wwwlogs/access.log|9981|20|150.109.77.71
/home/wwwlogs/access.log|9982|22|156.220.107.221
/home/wwwlogs/access.log|9983|22|138.204.135.251
/home/wwwlogs/access.log|9984|13|148.70.179.32
/home/wwwlogs/access.log|9985|18|148.70.243.161
/home/wwwlogs/access.log|9986|18|148.70.243.161
/home/wwwlogs/access.log|9987|18|148.70.243.161
/home/wwwlogs/access.log|9988|12|201.174.10.7
/home/wwwlogs/access.log|9989|13|148.70.179.32
/home/wwwlogs/access.log|9990|23|220.181.108.143
/home/wwwlogs/access.log|9991|31|111.206.198.18
/home/wwwlogs/access.log|10000|13|170.238.36.20
打印访问日志的HTTP状态码
awk -F" " '{print $9}' /home/wwwlogs/access.log
404
404
404
301
200
200
200
301
200
301
301
200
200
301
200
404
来几个复杂点的,打印下状态码分布,并按照大小排序
···
wk -F" " '{print $9}' /home/wwwlogs/access.log | sort | uniq -c | sort -nr
4939 404
4497 200
332 301
120 499
36 400
32 "-"
18 166
16 403
9 405
···
也可以看下IP访问TOP分布,分析是否有IP爬取网站
awk -F" " '{print $1}' /home/wwwlogs/access.log | sort | uniq -c | sort -nr | head -50
913 111.231.201.221
912 140.143.147.236
908 106.13.83.26
906 54.179.142.122
668 185.234.217.115
664 148.70.179.32
275 125.76.225.11
240 123.151.144.37
110 61.241.50.63
108 101.89.19.140
102 59.36.132.240
69 182.254.52.17
42 61.162.214.195
39 183.192.179.16
39 148.70.46.47
38 14.18.182.223
38 103.119.45.49
27 58.251.121.186
26 68.183.147.213
26 59.36.119.227
26 51.83.234.51
24 144.91.94.150
通过日志计算下每天访问的流量,预估将来需要的带宽
awk -F" " 'BEGIN {sum=0} {sum=sum+$10} END {print sum/1024/1024"M"}' /home/wwwlogs/access.log
38.7885M
AWK的BEGIN END 说明下,这个很好理解
BEGIN{ 执行前的语句 }
{处理每一行执行的语句}
END {处理完所有的行后要执行的语句 }
awk在系统日常维护中应该是使用最多的命令了。也特别简单,觉大多数场景下通过AWK分析access日志就能得到想要的分析结果。