本文介绍了在字节的字符,而不是.NET正EX pressions的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!


我试图做一些分析,这将使用常规EX pressions更容易。

I'm trying to do some parsing that will be easier using regular expressions.


The input is an array (or enumeration) of bytes.


I don't want to convert the bytes to chars for the following reasons:

  1. 在计算效率
  2. 内存消耗效率
  3. 某些不可打印的字节可能是复杂的转换为字符。并不是所有的字节可打印的。

所以,我不能使用(这适用于字节 - Visual C字符),但是这是一个C ++库使用C​​ ++ / CLI将需要相当的工作,包装

The only solution I know, is using Boost.Regex (which works on bytes - C chars), but this is a C++ library that wrapping using C++/CLI will take considerable work.

我怎么可以使用普通的EX pressions在直接.NET字节为单位,而不使用.NET字符串和字符?

How can I use regular expressions on bytes in .NET directly, without working with .NET strings and chars?



有一点的阻抗失配会在这里。你想在.NET正EX pressions其中使用字符串(多字节字符)的工作,但你要使用单字节字符的工作。您可以在两者使用的.Net按通常的同时不具备的。

There is a bit of impedance mismatch going on here. You want to work with Regular expressions in .Net which use strings (multi-byte characters), but you want to work with single byte characters. You can't have both at the same time using .Net as per usual.


However, to break this mismatch down, you could deal with a string in a byte oriented fashion and mutate it. The mutated string can then act as a re-usable buffer. In this way you will not have to convert bytes to chars, or convert your input buffer to a string (as per your question).


byte[] inputBuffer = { 66, 76, 73, 78, 71 };

string stringBuffer = new string('\0', 1000);

Regex regex = new Regex("ING", RegexOptions.Compiled);

    fixed (char* charArray = stringBuffer)
        byte* buffer = (byte*)(charArray);

        //Hard-coded example of string mutation, in practice you would
        //loop over your input buffers and regex\match so that the string
        //buffer is re-used.

        buffer[0] = inputBuffer[0];
        buffer[2] = inputBuffer[1];
        buffer[4] = inputBuffer[2];
        buffer[6] = inputBuffer[3];
        buffer[8] = inputBuffer[4];

        Console.WriteLine("Mutated string:'{0}'.",
             stringBuffer.Substring(0, inputBuffer.Length));

        Match match = regex.Match(stringBuffer, 0, inputBuffer.Length);

        Console.WriteLine("Position:{0} Length:{1}.", match.Index, match.Length);


Using this technique you can allocate a string "buffer" which can be re-used as the input to Regex, but you can mutate it with your bytes each time. This avoids the overhead of converting\encoding your byte array into a new .Net string each time you want to do a match. This could prove to be very significant as I have seen many an algorithm in .Net try to go at a million miles an hour only to be brought to its knees by string generation and the subsequent heap spamming and time spent in GC.


Obviously this is unsafe code, but it is .Net.


The results of the Regex will generate strings though, so you have an issue here. I'm not sure if there is a way of using Regex that will not generate new strings. You can certainly get at the match index and length information but the string generation violates your requirements for memory efficiency.



Actually after disassembling Regex\Match\Group\Capture, it looks like it only generates the captured string when you access the Value property, so you may at least not be generating strings if you only access index and length properties. However, you will be generating all the supporting Regex objects.

这篇关于在字节的字符,而不是.NET正EX pressions的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 10:25