本文介绍了如何前进通过字节流中包含的deflate字节序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字节流,它是段的串联,其中每个段都由一个报头和一个缩小的字节流组成.

I have a byte stream that is a concatenation of sections, where each section is composed of a header plus a deflated byte stream.

我需要分割此字节流部分,但标头仅包含有关未压缩形式的数据的信息,没有有关压缩数据长度的提示,因此我可以在流中正确前进并解析下一部分.

I need to split this byte stream sections but the header only contains information about the data in uncompressed form, no hint about the compressed data length so I can advance properly in the stream and parse the next section.

到目前为止,我发现只能通过缩小的字节序列的唯一方法是根据此规范.根据我对规范的理解,放气流是由块组成的,这些块可以是压缩块或文字块.

So far the only way I found to advance past the deflated byte sequece is to parse it according to the this specification. From what I understood by reading the specification, a deflate stream is composed of blocks, which can be compressed blocks or literal blocks.

文字块包含一个大小标头,可用于轻松前进.

Literal blocks contain a size header which can be used to easily advance past it.

压缩块由前缀代码"组成,这些前缀是可变长度的位序列,对deflate算法具有特殊含义.由于我只想了解缩小的流长度,因此我猜我需要寻找的唯一代码是"0000000",根据规范,该代码表示​​块的结尾.

Compressed blocks are composed with 'prefix codes', which are bit sequences of variable length that have special meanings to the deflate algorithm. Since I'm only interested in finding out the deflated stream length, I guess the only code I need to look for is '0000000' which according to the specification signals the end of block.

所以我想出了这个coffeescript函数来解析deflate流(我正在研究node.js)

So I came up with this coffeescript function to parse the deflate stream(I'm working on node.js)

# The job of this function is to return the position
# after the deflate stream contained in 'buffer'. The
# deflated stream begins at 'pos'.
advanceDeflateStream = (buffer, pos) ->
  byteOffset = 0
  finalBlock = false
  while 1
    if byteOffset == 6
      firstTypeBit = 0b00000001 & buffer[pos]
      pos++
      secondTypeBit = 0b10000000 & buffer[pos]
      type = firstTypeBit | (secondTypeBit << 1)
    else
      if byteOffset == 7
        pos++
      type = buffer[pos] & (0b01100000 >>> byteOffset)
    if type == 0
      # Literal block
      # ignore the remaining bits and advance position
      byteOffset = 0
      pos++
      len = buffer.readUInt16LE(pos)
      pos += 2
      lenComplement = buffer.readUInt16LE(pos)
      if (len ^ ~lenComplement)
        throw new Error('Literal block lengh check fail')
      pos += (2 + len) # Advance past literal block
    else if type in [1, 2]
      # huffman block
      # we are only interested in finding the 'block end' marker
      # which is signaled by the bit string 0000000 (256)
      eob = false
      matchedZeros = 0
      while !eob
        byte = buffer[pos]
        for i in [byteOffset..7]
          # loop the remaining bits looking for 7 consecutive zeros
          if (byte ^ (0b10000000 >>> byteOffset)) >>> (7 - byteOffset)
            matchedZeros++
          else
            # reset counter
            matchedZeros = 0
          if matchedZeros == 7
            eob = true
            break
          byteOffset++
        if !eob
          byteOffset = 0
          pos++
    else
      throw new Error('Invalid deflate block')
    finalBlock = buffer[pos] & (0b10000000 >>> byteOffset)
    if finalBlock
      break
  return pos

要检查是否可行,我编写了一个简单的摩卡测试用例:

To check if this works, I wrote a simple mocha test case:

zlib = require 'zlib'

test 'sample deflate stream', (done) ->
  data = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' # length 30
  zlib.deflate data, (err, deflated) ->
    # deflated.length == 11
    advanceDeflateStream(deflated, 0).shoudl.eql(11)
    done()

问题是此测试失败,并且我不知道如何调试它.我接受任何指出我在解析算法中遗漏的答案,或者以任何语言包含上述函数的正确版本.

The problem is that this test fails and I do not know how to debug it. I accept any answer that points what I missed in the parsing algorithm or contains a correct version of the above function in any language.

推荐答案

找到deflate流甚至是deflate块的末尾的唯一方法是对其中包含的所有霍夫曼代码进行解码.没有可以搜索的位模式,该位模式不会在流中更早出现.

The only way to find the end of a deflate stream or even a deflate block is to decode all of the Huffman codes contained within. There is no bit pattern that you can search for that can not appear earlier in the stream.

这篇关于如何前进通过字节流中包含的deflate字节序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 16:28