问题描述
我们的应用程序使用大量的具有多层次的查找不经常改变的字典。我们正在调查,在把一些关键的code表示不使用字典使用备用的结构进行了大量的查找 - 更快的查找,光对内存/ GC。这让我们比较一下各种字典/可用库 - 词典(System.Collections.Generics.Dictionary - SCGD),ImmutableDictionary,C5.HashDictionary,FSharpMap。运行下面的程序与各个项目数 - 100,1000,10000,100000 - 表明,字典仍然是赢家最多范围。第一行表示收集的物品。 MS /蜱会抽出时间进行随机x查找(code是如下图)。
产品 - 100
SCGD - 0 MS - 50蜱
C5 - 1 MS - 1767蜱
入境事务处 - 4 MS - 5951蜱
FS - 0 MS - 240蜱
产品 - 1000
SCGD - 0 MS - 230蜱
C5 - 0 MS - 496蜱
入境事务处 - 0 MS - 1046蜱
FS - 1 MS - 1870年蜱
产品 - 10000
SCGD - 3 MS - 4722蜱
C5 - 4 MS - 6370蜱
入境事务处 - 9 MS - 13119蜱
FS - 15 MS - 22050蜱
产品 - 100000
SCGD - 62 MS - 89276蜱
C5 - 72 MS - 103658蜱
入境事务处 - 172 MS - 246247蜱
FS - 249 MS - 356176蜱
这是否意味着,我们已经用最快的,并且没有改变?我有presumed是不可变的结构应在表的顶部,但它不是这样的。难道我们做错了比较,还是我失去了一些东西?抱着对这个问题,只是觉得,它能够更好地问。任何链接或票据,或任何引用抛出一些光会很大。
完成code测试 -
命名空间CollectionsTest{ 使用系统; 使用System.Collections.Generic; 使用System.Collections.Immutable; 使用System.Diagnostics程序; 使用System.Linq的; 使用System.Text; 使用System.Runtime; 使用Microsoft.FSharp.Collections; ///<总结> /// ///< /总结> 类节目 { 静态程序() { ProfileOptimization.SetProfileRoot(@\日新。); ProfileOptimization.StartProfile(Startup.Profile); } ///<总结> ///主电源的使用ARGS。 ///< /总结> ///< PARAM NAME =的args>在ARGS< /参数> 静态无效的主要(字串[] args) { // INIT测试数据--------------------------------------------- -------------------------------------------------- - 的foreach(INT MAXITEMS在新的INT [] {100,1000,10000,100000}) { Console.WriteLine(\ N# - {0},MAXITEMS); 名单<字符串> accessIndex =新的名单,其中,串>(MAXITEMS); 名单< KeyValuePair<字符串,对象>> listofkvps =新的名单,其中,KeyValuePair<字符串,对象>>(); 名单<元组LT;字符串,对象>> listoftuples =新名单,其中元组LT;字符串,对象>>(); 的for(int i = 0; I< MAXITEMS;我++) { listoftuples.Add(新行<字符串,对象>(i.ToString(),I)); listofkvps.Add(新KeyValuePair<字符串,对象>(i.ToString(),I)); accessIndex.Add(i.ToString()); } //随机化进行查找 随机R =新的随机(Environment.TickCount); 名单<字符串> randomIndexesList =新的名单,其中,串>(MAXITEMS); 而(accessIndex.Count大于0) { INT指数= r.Next(accessIndex.Count); 字符串值= accessIndex [指数] accessIndex.RemoveAt(指数); randomIndexesList.Add(值); } //转换为阵列的最佳PERF 字符串[] randomIndexes = randomIndexesList.ToArray(); // 加载 - - - - - - - - - - - - - - - - - - - - - - - - ------------------------------------------------- // IMMU ImmutableDictionary<字符串,对象> IDictionary的= ImmutableDictionary.Create<字符串,对象>(listofkvps); //Console.WriteLine(idictionary.Count); // SCGD 字典<字符串,对象>词典=新词典<字符串,对象>(); 的for(int i = 0; I< MAXITEMS;我++) { dictionary.Add(i.ToString(),I); } //Console.WriteLine(dictionary.Count); // C5 C5.HashDictionary<字符串,对象> c5dictionary =新C5.HashDictionary<字符串,对象>(); 的for(int i = 0; I< MAXITEMS;我++) { c5dictionary.Add(i.ToString(),I); } //Console.WriteLine(c5dictionary.Count); //如何更改为只读? // F# FSharpMap<字符串,对象> fdictionary =新FSharpMap<字符串,对象>(listoftuples); //Console.WriteLine(fdictionary.Count); // 测试 - - - - - - - - - - - - - - - - - - - - - - - - ------------------------------------------------- 秒表SW = Stopwatch.StartNew(); 对于(INT指数= 0,indexMax = randomIndexes.Length;指数< indexMax;指数++) { 串I = randomIndexes [指数] 对象的值; dictionary.TryGetValue(我,超时值); } sw.Stop(); Console.WriteLine(SCGD - {0,10} MS - {1,10}蜱,sw.ElapsedMilliseconds,sw.ElapsedTicks); 秒表c5sw = Stopwatch.StartNew(); 对于(INT指数= 0,indexMax = randomIndexes.Length;指数< indexMax;指数++) { 字符串键= randomIndexes [指数] 对象的值; c5dictionary.Find(REF键,超时值); } c5sw.Stop(); Console.WriteLine(C5 - {0,10} MS - {1,10}蜱,c5sw.ElapsedMilliseconds,c5sw.ElapsedTicks); 秒表ISW = Stopwatch.StartNew(); 对于(INT指数= 0,indexMax = randomIndexes.Length;指数< indexMax;指数++) { 串I = randomIndexes [指数] 对象的值; idictionary.TryGetValue(我,超时值); } isw.Stop(); Console.WriteLine(入境处 - {0,10} MS - {1,10}蜱,isw.ElapsedMilliseconds,isw.ElapsedTicks); 秒表FSW = Stopwatch.StartNew(); 对于(INT指数= 0,indexMax = randomIndexes.Length;指数< indexMax;指数++) { 串I = randomIndexes [指数] fdictionary.TryFind(ⅰ); } fsw.Stop(); Console.WriteLine(FS - {0,10} MS - {1,10}蜱,fsw.ElapsedMilliseconds,fsw.ElapsedTicks); } } }}
标准字典已经是相当不错的优化。它确实做到了,当你做一个查找是计算所提供的密钥(其速度取决于键的类型以及它是如何实现的哈希 GetHash code
),对散列值的模操作以找到合适的桶,然后将其通过桶的内容迭代,直到找到合适的值(其速度取决于 GetHash $的质量C $ç
的功能,因此,如果桶是很好的平衡,并且不包含太多项目,而等于
法的速度类型)。
所有的一切,它不会做那么多的查找,所以我不认为你能够找到一个显著更快的通用数据结构。然而,这是可能的,这取决于你打算如何使用字典,你能够建立一个更加的专业的解决方案。例如,我需要一个真正的快速查找,其中的关键是一个类型。而不是使用字典和做词典[typeof运算(T)]
,我做了一个泛型类,如下所示:
类ValueStore< T>
{
公共静态的T值;
}
所以,我可能只是做 ValueStore< T> .value的
以pretty的很多零查找开销。
无论你是否可以做同样的事情(以及它是否是值得的)真的取决于你的用例;多少项目的结构将保持多久它被读取和写入,是否需要是线程,多么重要写入速度等。例如,如果需要的话写入速度没有在所有问题,但线程安全,你需要做一个副本上写的,其中数据结构不会被写入,而是复制,保证线程安全和无锁(是这样的:快)读取,在写入速度为代价。专业它可以尽可能重新排序上写的结构去优化它,因此散列桶不包含N多的项目。
PS:如果你真的绝望了速度,但不能建立一个更专业的数据结构,那么你也许可以得到复制小的收益词典< TKEY的,TValue>
和消除各种完整性检查(null检查和等)和虚拟/接口方法的调用。不过,我怀疑这会给你任何超过20%的涨幅,如果这一点。
Our application uses plenty of dictionaries which have multi level lookup that are not frequently changing. We are investigating at converting some of the critical code that does a lot of lookup using dictionaries to use alternate structures for - faster lookup, light on the memory/gc. This made us compare various dictionaries/libraries available - Dictionary(System.Collections.Generics.Dictionary-SCGD), ImmutableDictionary, C5.HashDictionary, FSharpMap. Running the following program with various items count - 100, 1000, 10000, 100000 - shows that Dictionary is still the winner at most ranges. First row indicates items in collection. MS/Ticks would the time taken to perform x lookups randomized (code is below).
Items - 100
SCGD - 0 MS - 50 Ticks
C5 - 1 MS - 1767 Ticks
Imm - 4 MS - 5951 Ticks
FS - 0 MS - 240 Ticks
Items - 1000
SCGD - 0 MS - 230 Ticks
C5 - 0 MS - 496 Ticks
Imm - 0 MS - 1046 Ticks
FS - 1 MS - 1870 Ticks
Items - 10000
SCGD - 3 MS - 4722 Ticks
C5 - 4 MS - 6370 Ticks
Imm - 9 MS - 13119 Ticks
FS - 15 MS - 22050 Ticks
Items - 100000
SCGD - 62 MS - 89276 Ticks
C5 - 72 MS - 103658 Ticks
Imm - 172 MS - 246247 Ticks
FS - 249 MS - 356176 Ticks
Does this mean, we are already using the fastest and don't have to change? I had presumed that immutable structures should be at the top of table, but it wasn't that way. Are we doing wrong comparison or am I missing something? Was holding on to this question, but felt, its better to ask. Any link or notes or any references that throws some light will be great.
Complete code for test -
namespace CollectionsTest
{
using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Runtime;
using Microsoft.FSharp.Collections;
/// <summary>
///
/// </summary>
class Program
{
static Program()
{
ProfileOptimization.SetProfileRoot(@".\Jit");
ProfileOptimization.StartProfile("Startup.Profile");
}
/// <summary>
/// Mains the specified args.
/// </summary>
/// <param name="args">The args.</param>
static void Main(string[] args)
{
// INIT TEST DATA ------------------------------------------------------------------------------------------------
foreach (int MAXITEMS in new int[] { 100, 1000, 10000, 100000 })
{
Console.WriteLine("\n# - {0}", MAXITEMS);
List<string> accessIndex = new List<string>(MAXITEMS);
List<KeyValuePair<string, object>> listofkvps = new List<KeyValuePair<string, object>>();
List<Tuple<string, object>> listoftuples = new List<Tuple<string, object>>();
for (int i = 0; i < MAXITEMS; i++)
{
listoftuples.Add(new Tuple<string, object>(i.ToString(), i));
listofkvps.Add(new KeyValuePair<string, object>(i.ToString(), i));
accessIndex.Add(i.ToString());
}
// Randomize for lookups
Random r = new Random(Environment.TickCount);
List<string> randomIndexesList = new List<string>(MAXITEMS);
while (accessIndex.Count > 0)
{
int index = r.Next(accessIndex.Count);
string value = accessIndex[index];
accessIndex.RemoveAt(index);
randomIndexesList.Add(value);
}
// Convert to array for best perf
string[] randomIndexes = randomIndexesList.ToArray();
// LOAD ------------------------------------------------------------------------------------------------
// IMMU
ImmutableDictionary<string, object> idictionary = ImmutableDictionary.Create<string, object>(listofkvps);
//Console.WriteLine(idictionary.Count);
// SCGD
Dictionary<string, object> dictionary = new Dictionary<string, object>();
for (int i = 0; i < MAXITEMS; i++)
{
dictionary.Add(i.ToString(), i);
}
//Console.WriteLine(dictionary.Count);
// C5
C5.HashDictionary<string, object> c5dictionary = new C5.HashDictionary<string, object>();
for (int i = 0; i < MAXITEMS; i++)
{
c5dictionary.Add(i.ToString(), i);
}
//Console.WriteLine(c5dictionary.Count);
// how to change to readonly?
// F#
FSharpMap<string, object> fdictionary = new FSharpMap<string, object>(listoftuples);
//Console.WriteLine(fdictionary.Count);
// TEST ------------------------------------------------------------------------------------------------
Stopwatch sw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string i = randomIndexes[index];
object value;
dictionary.TryGetValue(i, out value);
}
sw.Stop();
Console.WriteLine("SCGD - {0,10} MS - {1,10} Ticks", sw.ElapsedMilliseconds, sw.ElapsedTicks);
Stopwatch c5sw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string key = randomIndexes[index];
object value;
c5dictionary.Find(ref key, out value);
}
c5sw.Stop();
Console.WriteLine("C5 - {0,10} MS - {1,10} Ticks", c5sw.ElapsedMilliseconds, c5sw.ElapsedTicks);
Stopwatch isw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string i = randomIndexes[index];
object value;
idictionary.TryGetValue(i, out value);
}
isw.Stop();
Console.WriteLine("Imm - {0,10} MS - {1,10} Ticks", isw.ElapsedMilliseconds, isw.ElapsedTicks);
Stopwatch fsw = Stopwatch.StartNew();
for (int index = 0, indexMax = randomIndexes.Length; index < indexMax; index++)
{
string i = randomIndexes[index];
fdictionary.TryFind(i);
}
fsw.Stop();
Console.WriteLine("FS - {0,10} MS - {1,10} Ticks", fsw.ElapsedMilliseconds, fsw.ElapsedTicks);
}
}
}
}
The standard dictionary is already quite well optimized. All it really does when you do a lookup is calculate the hash of the provided key (the speed of which depends on the type of the key and how it implements GetHashCode
), a modulo operation on the hash value to find the right bucket and then it iterates through the contents of the bucket until it finds the right value (the speed of which depends on the quality of the GetHashCode
function, so if the buckets are well balanced and don't contain too many items, and the speed of the Equals
method for the type).
All in all, it doesn't do that much for lookups, so I don't think you'll be able to find a significantly faster generic data structure. However, it's possible that depending on how you plan to use the dictionaries, you're able to build a more specialized solution. For example, I needed a really fast lookup where the key was a type. Instead of using a dictionary and doing dictionary[typeof(T)]
, I made a generic class like so:
class ValueStore<T>
{
public static T Value;
}
So I could just do ValueStore<T>.Value
with pretty much zero lookup overhead.
Whether or not you could do something similar (and whether it's worth it) really depends on your usecase; how many items the structure would hold, how often it's being read and written to, whether it needs to be threadsafe, how important write speed is, etc. For example, if write speed didn't matter at all but if thread safety was required, you would need to do a copy-on-write, where the data structure is never written to but instead copied, ensuring thread safety and lockless (thus: fast) reads, at the cost of write speeds. Specializing it could go as far as reordering the structure on writes to optimize it so the hash buckets don't contain more than N items.
PS: if you were really desperate for speed but couldn't build a more specialized data structure then you could perhaps get small gains from copying Dictionary<TKey,TValue>
and removing the various sanity checks (null checks and such) and virtual/interface method calls. However, I doubt this would give you any more than 20% gain, if that.
这篇关于不可变字典VS字典VS C5 VS F# - 性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!