我想用bash脚本从乳胶文档中过滤出脚注。它可能类似于以下两个示例之一:

Some text with a short footnote.\footnote{Some \textbf{explanation}.}

Some text with a longer footnote.%
  \footnote{Lorem ipsum dolor
     sit amet, etc. etc. etc. \emph{along \emph{multiple} lines}
     but all lines increased indent from the start.}

遗骸应该是:
Some text with a short footnote.

Some text with a longer footnote.%

我不在乎多余的空格。
因为不能用正则表达式来匹配括号,所以我想我不能用sed来匹配。是否可以使用awk或其他工具?

最佳答案

使用GNU awk for multi char RS和zero FS将记录拆分为字符:

$ cat tst.awk
BEGIN { RS="[\\\\]footnote"; ORS=""; FS="" }
NR>1 {
    braceCnt=0
    for (charPos=1; charPos<=NF; charPos++) {
        if ($charPos == "{") { ++braceCnt }
        if ($charPos == "}") { --braceCnt }
        if (braceCnt == 0)   { break }
    }
    $0 = substr($0,charPos+1)
}
{ print }

$ awk -f tst.awk  file
Some text with a short footnote.

Some text with a longer footnote.%

09-28 14:38