问题描述
我编写了简单的 WorkerRole,将测试数据添加到表中.插入的代码是这样的.
I write simple WorkerRole that add test data in to table. The code of inserts is like this.
var TableClient = this.StorageAccount.CreateCloudTableClient();
TableClient.CreateTableIfNotExist(TableName);
var Context = TableClient.GetDataServiceContext();
this.Context.AddObject(TableName, obj);
this.Context.SaveChanges();
此代码针对每个客户端请求运行.我使用 1-30 个客户端线程进行测试.我对各种大小的实例进行了多次尝试.我不知道我做错了什么,但每秒插入次数达不到 10 次.如果有人知道如何提高速度,请告诉我.谢谢
This code runs for each client requests. I do test with 1-30 client threads.I have many trys with various count of instances of various sizes. I don't know what I do wrong but I can't reach more 10 inserts per second.If someone know how to increase speed please advise me.Thanks
更新
- 删除 CreateTableIfNotExist 对我的插入测试没有影响.
- 切换模式为expect100Continue="false" useNagleAlgorithm="false" 在插入速率跳到30-40 ips 时产生短时间效果.但是,30 秒后插入率下降到 6 ips,超时 50%.
推荐答案
为了加快处理速度,您应该使用批处理事务(实体组事务),允许您在单个请求中提交多达 100 个项目:
To speed things up you should use batch transactions (Entity Group Transactions), allowing you to commit up to 100 items within a single request:
foreach (var item in myItemsToAdd)
{
this.Context.AddObject(TableName, item);
}
this.Context.SaveChanges(SaveChangesOptions.Batch);
您可以将此与 Partitioner.Create 结合使用 (+ AsParallel) 在每批 100 个项目的不同线程/核心上发送多个请求,使事情变得非常快.
You can combine this with Partitioner.Create (+ AsParallel) to send multiple requests on different threads/cores per batch of 100 items to make things really fast.
但在执行所有这些操作之前,阅读限制 使用批处理事务(100 个项目,每个事务 1 个分区,......).
But before doing all of this, read through the limitations of using batch transactions (100 items, 1 partition per transaction, ...).
更新:
由于您不能使用事务,这里有一些其他提示.看看 这个 MSDN 线程 关于在使用表存储时提高性能.我写了一些代码来告诉你区别:
Since you can't use transactions here are some other tips. Take a look at this MSDN thread about improving performance when using table storage. I wrote some code to show you the difference:
private static void SequentialInserts(CloudTableClient client)
{
var context = client.GetDataServiceContext();
Trace.WriteLine("Starting sequential inserts.");
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 1000; i++)
{
Trace.WriteLine(String.Format("Adding item {0}. Thread ID: {1}", i, Thread.CurrentThread.ManagedThreadId));
context.AddObject(TABLENAME, new MyEntity()
{
Date = DateTime.UtcNow,
PartitionKey = "Test",
RowKey = Guid.NewGuid().ToString(),
Text = String.Format("Item {0} - {1}", i, Guid.NewGuid().ToString())
});
context.SaveChanges();
}
stopwatch.Stop();
Trace.WriteLine("Done in: " + stopwatch.Elapsed.ToString());
}
所以,我第一次运行时得到以下输出:
So, the first time I run this I get the following output:
Starting sequential inserts.
Adding item 0. Thread ID: 10
Adding item 1. Thread ID: 10
..
Adding item 999. Thread ID: 10
Done in: 00:03:39.9675521
添加 1000 个项目需要 3 多分钟.现在,我根据 MSDN 论坛上的提示更改了 app.config(maxconnection 应为 12 * CPU 核数):
It takes more than 3 minutes to add 1000 items. Now, I changed the app.config based on the tips on the MSDN forum (maxconnection should be 12 * number of CPU cores):
<system.net>
<settings>
<servicePointManager expect100Continue="false" useNagleAlgorithm="false"/>
</settings>
<connectionManagement>
<add address = "*" maxconnection = "48" />
</connectionManagement>
</system.net>
再次运行应用程序后,我得到以下输出:
And after running the application again I get this output:
Starting sequential inserts.
Adding item 0. Thread ID: 10
Adding item 1. Thread ID: 10
..
Adding item 999. Thread ID: 10
Done in: 00:00:18.9342480
从超过 3 分钟到 18 秒.有什么不同!但我们还可以做得更好.下面是一些使用 Partitioner 插入所有项目的代码(插入将并行发生):
From over 3 minutes to 18 seconds. What a difference! But we can do even better. Here is some code inserts all items using a Partitioner (inserts will happen in parallel):
private static void ParallelInserts(CloudTableClient client)
{
Trace.WriteLine("Starting parallel inserts.");
var stopwatch = new Stopwatch();
stopwatch.Start();
var partitioner = Partitioner.Create(0, 1000, 10);
var options = new ParallelOptions { MaxDegreeOfParallelism = 8 };
Parallel.ForEach(partitioner, options, range =>
{
var context = client.GetDataServiceContext();
for (int i = range.Item1; i < range.Item2; i++)
{
Trace.WriteLine(String.Format("Adding item {0}. Thread ID: {1}", i, Thread.CurrentThread.ManagedThreadId));
context.AddObject(TABLENAME, new MyEntity()
{
Date = DateTime.UtcNow,
PartitionKey = "Test",
RowKey = Guid.NewGuid().ToString(),
Text = String.Format("Item {0} - {1}", i, Guid.NewGuid().ToString())
});
context.SaveChanges();
}
});
stopwatch.Stop();
Trace.WriteLine("Done in: " + stopwatch.Elapsed.ToString());
}
结果:
Starting parallel inserts.
Adding item 0. Thread ID: 10
Adding item 10. Thread ID: 18
Adding item 999. Thread ID: 16
..
Done in: 00:00:04.6041978
瞧,我们从 3 分 39 秒降到了 18 秒,现在我们甚至降到了 4 秒.
Voila, from 3m39s we dropped to 18s and now we even dropped to 4s.
这篇关于如何使用 azure 存储表实现每秒 10 次以上的插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!