This question already has answers here:
RegEx match open tags except XHTML self-contained tags
                                
                                    (34个答案)
                                
                        
                                3年前关闭。
            
                    
我想从此正则表达式中删除所有带有awk的html标记:/[<.*.>]/如果在任何字段中都找到了该正则表达式。我一直在尝试使其与sub或substr一起使用,但我无法为此找到正确的逻辑。

输入文本:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation<br/><div style="margin-top:6px">< b>veniam:< /b>< /div> <br/><div style="margin-top:6px">< b>Confort:< /b></div>Comenzi volan; Cruise-control; Servodirectie; <br/>

输出:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitationveniam: Confort:Comenzi volan; Cruise-control; Servodirectie;

最佳答案

如果您不是真正在解析HTML,而是只想删除文本文件中每个<...>对之间的所有内容,那么对于多字符RS,GNU awk就是这样:

$ awk -v RS='<[^>]+>' -v ORS= '1' file
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitationveniam: Confort:Comenzi volan; Cruise-control; Servodirectie;

08-26 21:34