说明:
mariadb audit log是 mariadb 的审计日志
目的是把日志拆分成 tab 键分隔的字段
直接附上 fluentd 配置文件
<system>
log_level error
</system> <source>
@type tail
path /data/mysql_audit/*
limit_recently_modified 86400
open_on_every_update true
tag mysql_audit
read_from_head true
pos_file /tmp/fluentd.pos
<parse>
@type multiline
format_firstline /^\d{8}/
format1 /^(?<Date>\d{8}) (?<Hour>\d{2}):(?<Min>\d{2}):(?<Sec>\d{2}),(?<host>[^,]+),(?<user>[^,]+),(?<ip>[^,]+),(?<connid>[^,]+),(?<queryid>[^,]+),(?<action>[^,]+),(?<db>[^,]+),(?<message>.*),(?<retcode>\d+)$/
</parse>
</source> <filter mysql_audit>
@type grep
<regexp>
key action
pattern QUERY
</regexp>
<exclude>
key user
pattern lagou_status
</exclude>
<exclude>
key db
pattern information_schema
</exclude>
</filter> <filter mysql_audit>
@type record_transformer
enable_ruby
<record>
message ${record["message"].gsub(/\s/, ' ')}
message ${record["message"].gsub(/\s+/, ' ')}
</record>
</filter> <match mysql_audit>
#@type stdout
@type webhdfs
host oss-hadoop-namenode-bjc-002
path /mysql_audit/${Date}/${host}_${Hour}
append true
compress gzip
<format>
@type csv
fields Date,Hour,Min,Sec,host,user,ip,action,db,message,retcode
delimiter ' '
</format>
<buffer host,Date,Hour>
@type memory
flush_interval 20s
</buffer>
</match>
fluentd 比 logstash 内存占用大大下降
分析同样的日志 logstash 占用700M, fluentd 占用35M
不过 cpu 占用相当,对于日志量大的机器 cpu 到100%
看来对日志做正则过滤很损耗 cpu