我需要将正则表达式与golang集成的帮助。
我想解析日志文件并创建一个正则表达式,它在https://regex101.com/r/p4mbiS/1/上看起来还不错
一条日志行如下所示:
57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"
正则表达式是这样的:(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"
命名组的结果应如下所示:regex101.com生成对我不起作用的golang代码。我试图改善它,但没有成功。
golang代码仅返回整个字符串,而不是组。
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"`)
var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
if len(re.FindStringIndex(str)) > 0 {
fmt.Println(re.FindString(str),"found at index",re.FindStringIndex(str)[0])
}
}
在这里找到 fiddle https://play.golang.org/p/e0_8PM-Nv6i 最佳答案
由于定义了捕获组并需要提取它们的值,因此需要使用You .FindStringSubmatch
:
package main
import (
"regexp"
"fmt"
)
func main() {
var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)
var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
match := re.FindStringSubmatch(str)
fmt.Printf("IP: %s\nLocal Time: %s\nRequest: %s\nRef: %s\nAgent: %s", match[1],match[2], match[3], match[4], match[5])
}
输出:
IP: 57.157.87.86
Local Time: 06/Feb/2020:00:11:04 +0100
Request: parammore=1&customer_id=1&version=1.56¶m=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1
Ref: https://www.somewebsite.com/more/andheresomemore/
Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
请注意,然后不需要命名的捕获组,只需使用编号的捕获组即可:
^(\S+)[\s-]+\[([^][]*)]\s+"GET\s+/\?([^"]+)"[^"]+"([^"]+)"\s+"([^"]+)"$
参见this regex demo。在模式中如此频繁地使用
.+?
并不是一个好主意,因为它会降低性能,因此我用否定的字符类替换了这些点模式,并试图使模式更加冗长。关于regex - 正则表达式在Golang中命名组,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60109288/