我正在解析一个二进制协议,该协议的UTF-8字符串散布在原始字节中.该特定协议在每个UTF-8字符串的前面加了一个简短的(两个字节),指示接下来的UTF-8字符串的长度.这样就可以使最大字符串长度2 ^ 16> 65 000,这对于特定的应用来说绰绰有余.
I am parsing a binary protocol which has UTF-8 strings interspersed among raw bytes. This particular protocol prefaces each UTF-8 string with a short (two bytes) indicating the length of the following UTF-8 string. This gives a maximum string length 2^16 > 65 000 which is more than adequate for the particular application.
My question is, is this a standard way of delimiting UTF-8 strings?
I wouldn't call that delimiting, more like "length prefixing". Some people call them Pascal strings since in the early days the language Pascal was one of the popular ones that stored strings that way in memory.
I don't think there's a formal standard specifically for just that, as it's a rather obvious way of storing UTF-8 strings (or any strings of bytes for that matter). It's defined over and over as a part of many standards that deal with messages that contain strings, though.