问题描述
在C#中解析字符串的最快方法是什么?
What's the fastest way to parse strings in C#?
当前,我仅使用字符串索引(string[index]
)并且代码运行合理,但是我不禁认为,对索引访问器所做的连续范围检查肯定是要添加一些东西.
Currently I'm just using string indexing (string[index]
) and the code runs reasonably, but I can't help but think that the continuous range checking that the index accessor does must be adding something.
所以,我想知道我应该考虑采用哪些技术来增强它.这些是我最初的想法/问题:
So, I'm wondering what techniques I should consider to give it a boost. These are my initial thoughts/questions:
- 使用类似
string.IndexOf()
和IndexOfAny()
的方法来查找感兴趣的字符.这些比通过string[index]
手动扫描字符串快吗? - 使用正则表达式.就我个人而言,我不喜欢正则表达式,因为我发现它们难以维护,但是它们可能比手动扫描字符串快吗?
- 使用不安全的代码和指针.这样可以消除索引范围检查,但是我已经读过,不安全的代码不会在不受信任的环境中运行.这到底意味着什么?这是否意味着整个程序集将不会加载/运行,还是仅会标记为不安全的代码拒绝运行?该库可能会在许多环境中使用,因此能够回退到较慢但更兼容的模式会很好.
- 我还能考虑什么?
- Use methods like
string.IndexOf()
andIndexOfAny()
to find characters of interest. Are these faster than manually scanning a string bystring[index]
? - Use regex's. Personally, I don't like regex as I find them difficult to maintain, but are these likely to be faster than manually scanning the string?
- Use unsafe code and pointers. This would eliminate the index range checking but I've read that unsafe code wont run in untrusted environments. What exactly are the implications of this? Does this mean the whole assembly won't load/run, or will only the code marked unsafe refuse to run? The library could potentially be used in a number of environments, so to be able to fall back to a slower but more compatible mode would be nice.
- What else might I consider?
NB:我应该说,我正在解析的字符串可能相当大(例如30k),并且采用的是自定义格式,而标准NET解析器没有这种格式.另外,此代码的性能也不是至关重要的,因此这在一定程度上只是出于好奇的理论问题.
NB: I should say, the strings I'm parsing could be reasonably large (say 30k) and in a custom format for which there is no standard .NET parser. Also, performance of this code is not super critical, so this partly just a theoretical question of curiosity.
推荐答案
30k不是我认为很大的值.在变得兴奋之前,我先介绍一下.分度器应该可以很好地兼顾灵活性和安全性.
30k is not what I would consider to be large. Before getting excited, I would profile. The indexer should be fine for the best balance of flexibility and safety.
例如,要创建一个128k字符串(以及一个相同大小的单独数组),请用垃圾填充(包括处理Random
的时间),然后通过索引器将所有字符代码点求和. .3ms:
For example, to create a 128k string (and a separate array of the same size), fill it with junk (including the time to handle Random
) and sum all the character code-points via the indexer takes... 3ms:
var watch = Stopwatch.StartNew();
char[] chars = new char[128 * 1024];
Random rand = new Random(); // fill with junk
for (int i = 0; i < chars.Length; i++) chars[i] =
(char) ((int) 'a' + rand.Next(26));
int sum = 0;
string s = new string(chars);
int len = s.Length;
for(int i = 0 ; i < len ; i++)
{
sum += (int) chars[i];
}
watch.Stop();
Console.WriteLine(sum);
Console.WriteLine(watch.ElapsedMilliseconds + "ms");
Console.ReadLine();
对于实际上是 的文件,应使用 reader 方法-StreamReader
等.
For files that are actually large, a reader approach should be used - StreamReader
etc.
这篇关于C#中的快速字符串解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!