linux - 从Shell脚本中的RTF文件中提取信息

我们有许多rtf文件，需要在oracle ebs中上传到它们各自的类别中。为此，我们需要读取存储在rtf文件的文档属性中的一些信息。这些字段是标题、主题、作者、公司和类别。
当我们在记事本中打开一个rtf文件时，我们可以看到这些信息，但不知道如何使用linux命令提取它。使用grep不是很成功。
我在这里粘贴了一部分包含此信息的rtf文件

\mwrapIndent1440\mintLim0\mnaryLim1}{\info**{\title ^XXSLS_GBL_ORDACK^}****{\subject XXSLS}****{\author ^es_ES,es_FR,ES_IT,ES_de^}**{\doccomm $Header: XXSLS_GBL_ORDACK_ES_ES.rtf $}
{\operator }{\creatim\yr2012\mo11\dy11\hr14\min3}{\revtim\yr2013\mo3\dy2\hr10\min43}{\version24}{\edmins361}{\nofpages4}{\nofwords725}{\nofchars14202}{\*\manager }{\*\company }**{\*\category ^BD^}**{\nofcharsws14898}
{\vern32773}}{\*\userprops {\propname _DocHome}\proptype3{\staticval -974575144}}{\*\xmlnstbl {\xmlns1 http://schemas.microsoft.com/office/word/2003/wordml}}\paperw11850\paperh18144\margl851\margr851\margt851\margb0\gutter0\ltrsect

有人可以建议我们如何提取这些信息如下：

Title=^XXSLS_GBL_ORDACK^
Subject=XXSLS
Author=^es_ES,es_FR,ES_IT,ES_de^
Category=^BD^

最佳答案

grep可以使用-e（高级regex）标志和-o（仅匹配输出）标志来完成这项工作。

 title=`grep -oE 'title [^\}]+' file.rtf | sed 's/title //g'`
 echo "title=$title"
 subject=`grep -oE 'subject [^\}]+' file.rtf | sed 's/subject //g'`
 echo "subject=$subject"
 author=`grep -oE 'author [^\}]+' file.rtf | sed 's/author //g'`
 echo "author=$author"
 category=`grep -oE 'category [^\}]+' file.rtf | sed 's/category //g'`
 echo "category=$category"

我明白了

title=^XXSLS_GBL_ORDACK^
subject=XXSLS
author=^es_ES,es_FR,ES_IT,ES_de^
category=^BD^

关于linux - 从Shell脚本中的RTF文件中提取信息，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/15198974/