原始xml文本如下
<?xml version="1.0" encoding="utf-8"?>
<Message>
<Header>
<Version>2000000</Version>
<MessageClass>5</MessageClass>
<MessageType>7</MessageType>
<SenderId>9999999964020001</SenderId>
<ReceiverId>9999999964011001</ReceiverId>
<MessageId>3280260</MessageId>
</Header>
<Body ContentType="1">
<ClearTargetDate>2017-03-22</ClearTargetDate>
<ServiceProviderId>9999999934030001</ServiceProviderId>
<IssuerId>9999999964011001</IssuerId>
<MessageId>406843026</MessageId>
<Count>1</Count>
<Amount>110.00</Amount>
<Transaction>
<TransId>1</TransId>
<Time>2017-03-21T20:40:36</Time>
<Fee>110.00</Fee>
<Service>
<ServiceType>1</ServiceType>
<Description>曹庄|宿州</Description>
<Detail>1|04|3401|804|33|20170321 204036|03|3401|1105|1|20170321 182056</Detail>
</Service>
<ICCard>
<CardType>22</CardType>
<NetNo>6401</NetNo>
<CardId>1638220100098530</CardId>
<License>宁B63222</License>
<TransNo>104</TransNo>
<PreBalance>2157.60</PreBalance>
<PostBalance>2047.60</PostBalance>
</ICCard>
<Validation>
<TAC>9439DAD2</TAC>
<TransType>09</TransType>
<TerminalNo>0134000030BC</TerminalNo>
<TerminalTransNo>0018002D</TerminalTransNo>
</Validation>
<OBU>
<NetNo>C4FE</NetNo>
<OBUId>0000000200031918</OBUId>
<OBEState>0001</OBEState>
<License>宁B63222</License>
</OBU>
</Transaction>
</Body>
</Message>
现在需要将上述内容Transaction标签中的值转换为下面的分隔符格式
1|||2017-03-21T20:40:36|||110.00|||1|||曹庄|宿州|||1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056||||||22|||6401|||1638220100098530|||宁B63222|||104|||2157.60|||2047.60||||||9439DAD2|||09|||0134000030BC|||0018002D||||||C4FE|||0000000200031918|||0001|||宁B63222|||
下面是我执行的操作步骤
1、替换换行符,将整个xml文件处理成一行文本,重定向到文本1中
cat ***.xml | tr "\n" " " > 1
结果如下
<?xml version="1.0" encoding="utf-8"?><Message> <Header> <Version>2000000</Version> <MessageClass>5</MessageClass> <MessageType>7</MessageType> <SenderId>9999999964020001</SenderId> <ReceiverId>9999999964011001</ReceiverId> <MessageId>3280260</MessageId> </Header> <Body ContentType="1"> <ClearTargetDate>2017-03-22</ClearTargetDate> <ServiceProviderId>9999999934030001</ServiceProviderId> <IssuerId>9999999964011001</IssuerId> <MessageId>406843026</MessageId> <Count>1</Count> <Amount>110.00</Amount> <Transaction> <TransId>1</TransId> <Time>2017-03-21T20:40:36</Time> <Fee>110.00</Fee> <Service> <ServiceType>1</ServiceType> <Description>曹庄|宿州</Description> <Detail>1|04|3401|804|33|20170321 204036|03|3401|1105|1|20170321182056</Detail> </Service> <ICCard> <CardType>22</CardType> <NetNo>6401</NetNo> <CardId>1638220100098530</CardId> <License>宁B63222</License> <TransNo>104</TransNo> <PreBalance>2157.60</PreBalance><PostBalance>2047.60</PostBalance> </ICCard> <Validation> <TAC>9439DAD2</TAC> <TransType>09</TransType> <TerminalNo>0134000030BC</TerminalNo> <TerminalTransNo>0018002D</TerminalTransNo> </Validation> <OBU> <NetNo>C4FE</NetNo> <OBUId>0000000200031918</OBUId> <OBEState>0001</OBEState> <License>宁B63222</License> </OBU> </Transaction> </Body> </Message>
2、去除空格
sed 's/ //g' 1 > 2
结果如下
<?xml version="1.0" encoding="utf-8"?><Message><Header><Version>2000000</Version><MessageClass>5</MessageClass><MessageType>7</MessageType><SenderId>9999999964020001</SenderId><ReceiverId>9999999964011001</ReceiverId><MessageId>3280260</MessageId></Header><BodyContentType="1"><ClearTargetDate>2017-03-22</ClearTargetDate><ServiceProviderId>9999999934030001</ServiceProviderId><IssuerId>9999999964011001</IssuerId><MessageId>406843026</MessageId><Count>1</Count><Amount>110.00</Amount><Transaction><TransId>1</TransId><Time>2017-03-21T20:40:36</Time><Fee>110.00</Fee><Service><ServiceType>1</ServiceType><Description>曹庄|宿州</Description><Detail>1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056</Detail></Service><ICCard><CardType>22</CardType><NetNo>6401</NetNo><CardId>1638220100098530</CardId><License>宁B63222</License><TransNo>104</TransNo><PreBalance>2157.60</PreBalance><PostBalance>2047.60</PostBalance></ICCard><Validation><TAC>9439DAD2</TAC><TransType>09</TransType><TerminalNo>0134000030BC</TerminalNo><TerminalTransNo>0018002D</TerminalTransNo></Validation><OBU><NetNo>C4FE</NetNo><OBUId>0000000200031918</OBUId><OBEState>0001</OBEState><License>宁B63222</License></OBU></Transaction></Body></Message>
3、去除无用的头部和尾部xml,只保留Transaction标签中的内容
sed 's/.*<Transaction>//g;s/<\/OBU>.*<\/Message>//g' 2 > 3
结果如下
<TransId>1</TransId><Time>2017-03-21T20:40:36</Time><Fee>110.00</Fee><Service><ServiceType>1</ServiceType><Description>曹庄|宿州</Description><Detail>1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056</Detail></Service><ICCard><CardType>22</CardType><NetNo>6401</NetNo><CardId>1638220100098530</CardId><License>宁B63222</License><TransNo>104</TransNo><PreBalance>2157.60</PreBalance><PostBalance>2047.60</PostBalance></ICCard><Validation><TAC>9439DAD2</TAC><TransType>09</TransType><TerminalNo>0134000030BC</TerminalNo><TerminalTransNo>0018002D</TerminalTransNo></Validation><OBU><NetNo>C4FE</NetNo><OBUId>0000000200031918</OBUId><OBEState>0001</OBEState><License>宁B63222</License>
4、将闭合标签</***>替换为|||
sed 's/<\/[^>]*>/|||/g' 3 > 4
结果如下
<TransId>1|||<Time>2017-03-21T20:40:36|||<Fee>110.00|||<Service><ServiceType>1|||<Description>曹庄|宿州|||<Detail>1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056||||||<ICCard><CardType>22|||<NetNo>6401|||<CardId>1638220100098530|||<License>宁B63222|||<TransNo>104|||<PreBalance>2157.60|||<PostBalance>2047.60||||||<Validation><TAC>9439DAD2|||<TransType>09|||<TerminalNo>0134000030BC|||<TerminalTransNo>0018002D||||||<OBU><NetNo>C4FE|||<OBUId>0000000200031918|||<OBEState>0001|||<License>宁B63222|||
5、将开始标签<***>去除
sed 's/<[^>]*>//g' 4 > 5
结果如下
1|||2017-03-21T20:40:36|||110.00|||1|||曹庄|宿州|||1|04|3401|804|33|20170321204036|03|3401|1105|1|20170321182056||||||22|||6401|||1638220100098530|||宁B63222|||104|||2157.60|||2047.60||||||9439DAD2|||09|||0134000030BC|||0018002D||||||C4FE|||0000000200031918|||0001|||宁B63222|||
到此大功告成
将所有标签整理在一起
cat ***.xml | tr "\n" " " > 1
sed 's/ //g;s/.*<Transaction>//g;s/<\/OBU>.*<\/Message>//g;s/<\/[^>]*>/|||/g;s/<[^>]*>//g' 1 > 2