我想用bash脚本从乳胶文档中过滤出脚注。它可能类似于以下两个示例之一:
Some text with a short footnote.\footnote{Some \textbf{explanation}.}
Some text with a longer footnote.%
\footnote{Lorem ipsum dolor
sit amet, etc. etc. etc. \emph{along \emph{multiple} lines}
but all lines increased indent from the start.}
遗骸应该是:
Some text with a short footnote.
Some text with a longer footnote.%
我不在乎多余的空格。
因为不能用正则表达式来匹配括号,所以我想我不能用
sed
来匹配。是否可以使用awk
或其他工具? 最佳答案
使用GNU awk for multi char RS和zero FS将记录拆分为字符:
$ cat tst.awk
BEGIN { RS="[\\\\]footnote"; ORS=""; FS="" }
NR>1 {
braceCnt=0
for (charPos=1; charPos<=NF; charPos++) {
if ($charPos == "{") { ++braceCnt }
if ($charPos == "}") { --braceCnt }
if (braceCnt == 0) { break }
}
$0 = substr($0,charPos+1)
}
{ print }
$ awk -f tst.awk file
Some text with a short footnote.
Some text with a longer footnote.%