从Golang中的字符串中提取内部子字符串的最佳方法是什么?
输入:
"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"
输出:
"this is paragraph \n
this is paragraph 2"
Go的字符串包/库是否已经做过类似的事情?
package main
import (
"fmt"
"strings"
)
func main() {
longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"
newString := getInnerStrings("<p>", "</p>", longString)
fmt.Println(newString)
//output: this is paragraph \n
// this is paragraph 2
}
func getInnerStrings(start, end, str string) string {
//Brain Freeze
//Regex?
//Bytes Loop?
}
谢谢
最佳答案
Don't use regular expressions尝试解释HTML。使用fully capable HTML tokenizer and parser。
我建议您阅读CodingHorror上的this article。
关于regex - 在Golang中从HTML提取文本内容,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21000277/