实体框架与并行性

实体框架与并行性

本文介绍了实体框架与并行性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我有一个应用程序,该应用程序接收定期数据转储(XML文件),并使用Entity Framework 5(代码优先)将其导入到现有数据库中.导入是通过EF5进行的,而不是说BULK INSERT或BCP,因为必须应用实体中已经存在的业务规则.

I have an application that receives periodic data dumps (XML files) and imports them into an existing database using Entity Framework 5 (Code First). The import happens via EF5 rather than say BULK INSERT or BCP because business rules that already exist in the entities must be applied.

处理似乎是应用程序本身的CPU约束(极其快速,启用了写缓存的磁盘IO子系统在整个过程中显示几乎零磁盘等待时间,而SQL Server显示的CPU时间不超过8%-10%)

Processing seems to be CPU bound in the application itself (the extremely fast, write-cache enabled disk IO subsystem shows almost zero disk wait time throughout the process, and SQL Server shows no more than 8%-10% CPU time).

为了提高效率,我使用TPL Dataflow构建了管道组件:

To improve efficiency, I built a pipeline using TPL Dataflow with components to:

Read & Parse XML file
        |
        V
Create entities from XML Node
        |
        V
Batch entities (BatchBlock, currently n=200)
        |
        V
Create new DbContext / insert batched entities / ctx.SaveChanges()

我发现这样做可以显着提高性能,但不能使CPU达到约60%以上.

I see a substantial increase in performance by doing this, but can not get the CPU above about 60%.

分析

怀疑有某种资源争用,我使用VS2012 Profiler的资源争用数据(并发)模式运行了该过程.

Suspecting some sort of resource contention, I ran the process using the VS2012 Profiler's Resource contention data (concurrency) mode.

事件探查器显示出我对标记为 Handle 2 的资源的争用程度为52%.钻进去,我发现为 Handle 2 创建最多争用的方法是

The profiler shows me 52% contention for a resource labeled Handle 2. Drilling in, I see that the method creating the most contention for Handle 2 is

System.Data.Entity.Internal.InternalContext.SaveChanges()

第二名,大约是SaveChanges()的40%争用

Second place, at about 40% as many contentions as SaveChanges(), is

System.Data.Entity.DbSet`1.Add(!0)

问题

  • 如何确定句柄2 的真正含义(例如,TPL的一部分,EF的一部分)?
  • 是否可以使用EF节流调用从单独的线程中分离DbContext实例?他们似乎正在争夺共享资源.
  • 在这种情况下,我能做些什么来改善并行性吗?
  • How can I figure out what Handle 2 really is (e.g. part of TPL, part of EF)?
  • Does EF throttle calls to separate DbContext instances from separate threads? It seems there is a shared resource they are contending for.
  • Is there anything that I can do to improve parallelism in this case?

更新

对于有问题的运行,调用SaveChanges的任务的最大并行度设置为12(我在先前的运行中尝试过各种值,包括无界").

For the run in question, the maximum degree of parallelism for the task that calls SaveChanges is set to 12 (I tried various values including Unbounded in previous runs).

更新2

Microsoft的EF团队已提供反馈.查看我的答案以获取摘要.

Microsoft's EF team has provided feedback. See my answer for a summary.

推荐答案

下面总结了我与Entity Framework团队在此问题上的互动.如果有更多信息,我将更新答案

The following summarizes my interaction with the Entity Framework team on this issue. I'll update the answer if more information becomes available

  • 该问题可以在Microsoft再现.
  • 句柄争用与网络I/O有关(即使在本地主机上使用SQL Server也是如此).具体来说,System.Data.dll中网络I/O的读取缓冲区存在争用.
  • EF团队现在正在与SQL Connectivity团队合作,以更好地了解问题.
  • Microsoft尚未提供有关如何最大程度地减少此争用的影响的指导.

更新

现在可以在CodePlex上跟踪此问题:

This issue is now being tracked on CodePlex:

http://entityframework.codeplex.com/workitem/636?PendingVoteId=636

这篇关于实体框架与并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 17:15