我正在尝试在 hive 中创建一个正则表达式SERDE以读取一些日志文件,但是在使其工作时遇到问题...
日志文件看起来像这样...
14.196.202.16:9123 11329 2016-01-27 17:50:26.965 -5 Thread-14960 CCS 6104 1 Audit.rds.CCS reportDataService Failure <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message> <trace>ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.BufferedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source) at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source) at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598) at org.apache.coyote.Response.doWrite(Response.java:533) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 35 more </trace>
我得到了这么远:
([^ ]*)\t(-|[0-9]*)\t
并返回:
Match 1
1. 14.196.202.16:9123
2. 11329
这可以让我正确地返回前两个...但是当我像这样添加日期时:
([^ ]*)\t(-|[0-9]*)\t([^ ]*)\t
我得到这个:
Match 1
1. 17:50:26.965 -5 Thread-14960 CCS 6104 1 Audit.rds.CCS reportDataService
2.
3. Failure
我是regex的新手,正在尝试解决这个问题,但是遇到了麻烦...我也在尝试使用此网站:
http://rubular.com/
本质上,我试图使它看起来像这样:
1. 14.196.202.16:9123
2. 11329
3. 2016-01-27 17:50:26.965 -5
4.
5.
6.
7.
8. Thread-14960
9. CCS
10. 6104
11. 1
12. Audit.rds.CCS
13.
14. reportDataService
15.
16. Failure
17. <messages><message><messageString>RDS-ERR-1047 Unable to process the XML output stream. The XML is invalid.</messageString></message>
19. <trace>ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at java.io.BufferedOutputStream.write(Unknown Source) at java.io.BufferedOutputStream.write(Unknown Source) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at java.io.BufferedWriter.flushBuffer(Unknown Source) at java.io.BufferedWriter.write(Unknown Source) at java.io.Writer.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.write(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.writeAttribute(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeInlineStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLElement.writeStyles(Unknown Source) at com.cognos.ccs.formats.html.AHTMLTableElement.closeStartTag(Unknown Source) at com.cognos.ccs.formats.html.HTMLLayoutTable.processEvent(Unknown Source) at com.cognos.ccs.fsm.LdxHandler.startElement(Unknown Source) at com.cognos.ccs.formats.CCSFormatter.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(Unknown Source) at com.cognos.ccs.service.CCSDataResult$ProcessingThread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:761) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:785) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:124) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:598) at org.apache.coyote.Response.doWrite(Response.java:533) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 35 more </trace>
编辑:
所以我觉得我在正确的轨道上:
我现在有这个:
([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t(\w+|\S+|\s+)\t(\w+|.)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)(\w+|\S+|\s+|-)\t
但我仍然无法将
<message>
和<trace>
分组。 最佳答案
我让正则表达式起作用...这就是我最后要去的地方
([\d+]\S+[\d+])\t(\d+)\t([\d+]\S+[\d+] [\d+]\S+[\d+])\t(-[\d+])\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z0-9_\S]*)\t([a-zA-Z_\S]*)\t([0-9]*)\t([0-9]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)\t([a-zA-Z_\S ]*)
关于regex - 正则表达式serde读取配置单元中的日志文件,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/35167478/