html - 从单行输出中删除html/xml <tags>的最简单方法

我有来自grep的输出，我正在尝试清理，看起来像：

<words>Http://www.path.com/words</words>

我试过用…

sed 's/<.*>//'

…删除标记，但这会破坏整个行。我不知道为什么会这样，因为每个“”关闭。
最简单的方法是什么？
谢谢！

最佳答案

请尝试以下sed表达式：

sed 's/<.*>\(.*\)<\/.*>/\1/'

快速分解表达式：

<.*>   - Match the first tag
\(.*\) - Match and save the text between the tags
<\/.*> - Match the end tag making sure to escape the / character
\1     - Output the result of the first saved match
       -   (the text that is matched between \( and \))

有关反向引用的详细信息
评论中提出了一个问题，为了完整起见，可能应该加以解决。
\(和\)是sed的后参考标记。它们保存一部分匹配的表达式以供以后使用。
例如，如果我们有一个输入字符串：
这里面有（帕伦斯）。另外，我们可以用类似于thisparens的parens
使用反向引用。
我们发展出一个表达式：

sed s/.*(\(.*\)).*\1\\(.*\)\1.*/\1 \2/

这给了我们：

parens like this

那是怎么回事？让我们把这个表达式分解一下。
表达式分解：

sed s/ - This is the opening tag to a sed expression.
.*     - Match any character to start (as well as nothing).
(      - Match a literal left parenthesis character.
\(.*\) - Match any character and save as a back-reference. In this case it will match anything between the first open and last close parenthesis in the expression.
)      - Match a literal right parenthesis character.
.*     - Same as above.
\1     - Match the first saved back-reference. In the case of our sample this is filled in with `parens`
\(.*\) - Same as above.
\1     - Same as above.
/      - End of the match expression. Signals transition to the output expression.
\1 \2  - Print our two back-references.
/      - End of output expression.

如我们所见，从括号（(和)）之间获取的后引用被替换回匹配表达式中，以便能够匹配字符串parens。