本文介绍了帮助故障排除SqlException:在非负载情况下,超时在连接上过期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个托管我的网站,几乎没有流量的服务器。

有几个人(

几乎每天晚上,RSS阅读器将在半夜打我们,并得到一个例外,网站无法连接到SQL Server因为连接中有超时。
细节非常奇怪,所以我正在寻找一些帮助,可能是这个问题,因为我不知道从哪里开始寻找。



我们使用的是ASP.Net MVC,Entity框架和Windows Server 2008上的SQL Server 2008.这台机器是一个专门的盒子,我们从一个不是顶层的提供程序,所以事情可能配置非最佳,或谁知道什么。

这个框也很小,只有1Gb的RAM,但它应该采取我们现在的负载...



我复制下面的完整调用栈,但首先,我们知道的一些事情:




  • 错误总是发生当iTunes查询我们的网站。我相信这应该与任何事情无关,但事实是,我们只能从iTunes得到它。我最好的猜测是,这是因为只有iTunes在当时没有其他人击中我们的时候查询我们。

  • 我们的一个理论是SQL Server和IIS正在战斗对于内存,其中一个被分页到磁盘中,不使用,当有人唤醒时,从磁盘读回内存到内存需要太长时间。这是可能发生吗? (我有点丢弃这,因为它听起来像SQL Server中的一个设计问题,如果可能的话)。

  • 我也想到了我们泄漏连接的可能性,因为我们可能不适当处理EF实体()。这是我可以通过Googling的问题找到的唯一的东西。因为我们的负载非常低,所以我就放弃了。

  • 这总是发生在一个晚上,所以很可能是一段时间没有发生任何事情。例如,我很确定当这些请求命中时,Web服务器进程被回收,它启动/重新JIT化一切。重新JITting不解释SQL超时。






UPDATE:我们附加了一个建议的分析器,它花了一段时间,我们有一个新的异常。这是我们所知道的新东西:




  • 将分析器连接到极大地减少了我们得到的错误数量。事实上,在通常每天得到几次之后,我们不得不等待3或4天才会发生。一旦我们停止分析器,它就回到正常的错误频率(或甚至更糟)。

  • 查看IIS请求日志旁边的profiler跟踪,有一个 预期的请求和查询之间的1-1对应。然而,每一次,我看到大量的查询执行,没有任何纠正与IIS日志。事实上,在记录实际错误之前,我在3分钟内获得了 750次查询,所有这些都与IIS日志完全无关。查询文本看起来像EF生成的那种不可读的垃圾,它们并不完全相同,它们都看起来像来自网站的查询:Same ApplicationName,User等。为了给出一个想法是多么可笑在 2天

  • 过程中,该网站获得了大约370个击中数据库的IIS请求。这些无法解释的查询并不是来自相同的ClientProcessID以前的网站,虽然他们可能仍然来自网站,如果过程中回收在此期间。在最后一个解释的查询和第一个不明原因的查询之间,几乎没有活动。

  • 我不知道他们来自哪里的查询的一个长串来自右在错误发生之前我已经记录,所以我相信这是我们应该跟随的线索。

  • 正如我原来预期的,当引发错误的查询执行,它来自不同ClientProcessID比上一个,(比以前不明原因一个8分钟,几乎比以前的IIS一个小时)。这意味着,对我来说,工作进程确实已经被回收。

  • 这是我绝对不明白的。 IIS日志显示在错误请求之前一分钟,4被完美地服务,虽然那些的查询不会显示在跟踪中。事实上,在那4个进行顺利,我有4个异常抛出快速连续,那些4 ALSO不显示在跟踪(这是有道理的,因为如果有一个超时在连接查询应该从来没有得到执行,但我没有看到跟踪中的连接尝试)



简而言之,我完全不知道这一点。我找不到一个原因,那些数百个查询运行的快速连续,但我相信那些必须与问题有关。

我也不知道如何诊断连接问题...

或者Profiler跟踪可能丢失一些查询,根据IIS经历了...



任何想法? / p>




这是例外信息:

  System.Data.SqlClient.SqlException:超时已过期。操作完成之前超时时间或服务器未响应。 

System.Data.EntityException:打开时基础提供程序失败。 ---> System.Data.SqlClient.SqlException:超时已过期。操作完成之前超时时间或服务器未响应。
在System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
在System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
在System.Data.ProviderBase.DbConnectionClosed.OpenConnection (DbConnection outerConnection,DbConnectionFactory connectionFactory)
在System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection,DbConnectionFactory connectionFactory)
在System.Data.SqlClient.SqlConnection.Open()
在系统
在System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition,DbConnection storeConnectionToOpen,DbConnection())中的

($ boolean openCondition,DbConnection, originalConnection,String exceptionCode,String attemptedOperation,Boolean& closeStoreConnectionOnFailure)
---内部异常堆栈跟踪的结束 -
在System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(布尔openCondition,DbConnection storeConnectionToOpen,DbConnection originalConnection,String exceptionCode,String attemptsOperation,Boolean& closeStoreConnectionOnFailure)
在System.Data.EntityClient.EntityConnection.Open()
在System.Data.Objects.ObjectContext.EnsureConnection()
在System.Data.Objects.ObjectQuery`1.GetResults (Nullable`1 forMergeOption)
在System.Data.Objects.ObjectQuery`1.System.Collections.Generic.IEnumerable< T> .GetEnumerator()
在System.Linq.Enumerable.FirstOrDefault [TSource] (IEnumerable`1源)
在System.Data.Objects.ELinq.ObjectQueryProvider。< GetElementFunction> b__1 [TResult](IEnumerable`1 sequence)
在System.Data.Objects.ELinq.ObjectQueryProvider。 ExecuteSingle [TResult](IEnumerable`1查询,表达式queryRoot)
在System.Data.Objects.ELinq.ObjectQueryProvider.System.Linq.IQueryProvider.Execute [S](表达式表达式)
在System.Linq .Queryable.FirstOrDefault [TSource](IQueryable`1 source)
在MyProject.Controllers.SitesController.Feed(Int32 id)在C:\ ... \Controller.cs:line 38
at lambda_method(ExecutionScope,ControllerBase,Object [])
在System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext,IDictionary`2参数)
在System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext ,ActionDescriptor actionDescriptor,IDictionary`2 parameters)
在System.Web.Mvc.ControllerActionInvoker。<> c__DisplayClassa。< InvokeActionMethodWithFilters> b__7()
在System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter IActionFilter过滤器,ActionExecutingContext preContext,Func`1 continuation)
在System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext,IList`1过滤器,ActionDescriptor actionDescriptor,IDictionary`2参数)
在System.Web .Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext,String actionName)
在System.Web.Mvc.Controller.ExecuteCore()
在System.Web.Mvc.MvcHandler.ProcessRequest(HttpContextBase httpContext)
在System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
在System.Web.HttpApplication.ExecuteStep(IExecutionStep步骤,布尔& completedSynchronously)

任何想法都将非常感激。

解决方案

内存不足



这很可能是一个内存问题,也许加重或由其他事情触发,但仍然固有的内存问题。还有另外两种(不太可能)的可能性,你应该首先检查和消除(因为很容易这样做):



容易检查可能性:




  1. 您可能已启用自动关闭:自动关闭可能具有此行为,但很少打开。要检查这一点,在SSMS中右键单击您的应用程序数据库,选择属性,然后选择选项窗格。查看自动关闭条目,并确保它设置为False。也可以检查tempdb。


  2. SQL代理作业可能会导致:检查代理的历史记录,查看事件是否持续运行。记住还要检查维护作业,因为重建索引在运行时经常被称为性能问题。




为什么它看起来像一个内存问题:



如果没有显示任何内容,则应检查内存问题。我怀疑内存是你的情况的原因,因为:




  • 你有1 GB的内存:虽然这在技术上高于最小对于SQL Server,它是低于推荐的SQL Server,而低于我的经验是可以接受的生产,即使是轻负载服务器。


  • 您在同一个框中运行IIS和SQL Server:这不是自己推荐的,在很大程度上是因为内存的争用结果,但只有1 GB的内存,导致IIS,应用程序,SQL Server,操作系统和任何其他任务和/或维护全部为非常少的记忆战斗。 Windows管理这种方式的方式是通过积极地将它从非活动进程中提取内存给活动进程。


  • 对于一个大型进程(如SQL Server),它可能需要很多秒,甚至几分钟才能恢复足够的内存,

    分析器使90%的问题消失了:这是一个很大的线索,内存可能是问题,因为通常,类似分析器的东西对这个特定的问题有正确的影响:分析器任务保持SQL Server只是一个位所有的时间。通常,这只是足够的活动,以保持它从操作系统的清道夫列表,或至少减少它的影响一些。




如何检查内存为Culprit:




  1. 关闭分析器:


  2. 从其他计算机运行系统监视器(perfmon.exe),您必须将其关闭,否则您将无法可靠地看到问题。框,远程连接到运行SQL Server和IIS的框上的perfomrance收集服务。你可以最容易地做到这一点,首先删除三个默认统计信息(它们仅在本地),然后添加所需的统计信息(下面),但确保更改计算机名称在第一个下拉列表中连接到您的SQL


  3. 通过在perfmon上创建计数器日志,将收集的数据发送到文件。如果你不熟悉这一点,那么最简单的事情就是收集数据到一个标签或逗号分隔的文件,您可以用Excel打开分析。


  4. 设置您的perfmon以收集到文件并向其中添加以下计数器:



    - Processor \%Processor Time [Total]



    - PhysicalDisk \%闲置时间[每个磁盘]



    - PhysicalDisk\\ \\ Avg。磁盘队列长度[每个磁盘的长度]



    - Memory\Pages / sec



    - Memory\Page Reads /秒



    - Memory\Available MBytes



    - 网络接口\字节总数/秒[正在使用的每个接口]



    - Process \%Processor Time [见下文]



    - Process \Page Faults / sec [见下文]



    - Process \ Workinging Set [ ]


  5. 您想要包括sqlserver.exe进程,任何IIS进程和任何稳定的应用程序进程。注意,这将只适用于稳定的进程。


  6. 运行此集合时,可以使用此集合在问题最常发生的时间内的文件。将收集间隔设置为接近10-15秒。


  7. 遇到一个或多个事件后,请停止收集,然后使用Excel打开您的收藏数据文件。您可能需要重新格式化时间戳列有用地可见,并显示小时分钟和秒。使用IIS日志查找事件的确切时间,然后查看perfmon数据以了解事件发生前后发生了什么。特别是你想看看它的工作集是否小之前和大后,与很多页之间的错误。这是这个问题的最清晰的迹象。




解决方案:



将IIS和SQL Server分成两个不同的框(首选),或者向框中添加更多内存。



这个奇怪的EF文件有什么?



这里的问题是,它很可能是外围的或只有贡献你的主要问题。请记住,分析器会使90%的事件消失,因此剩下的事情可能会是另一个问题,或者它可能只是问题中最极端的问题。 因为它的行为我会猜测它是循环其缓存或有一些其他应用程序服务器进程的后台维护。


I have a server hosting a website of mine that has almost zero-traffic.
A few people (< 20) enter the site every day, and a few RSS readers are subscribed to some feeds we put out.

Almost every night, an RSS reader will hit us in the middle of the night and get an exception that the website can't connect to the SQL Server because of a Timeout in the connection.The details are extremely weird, so I'm looking for some help on what could be the issue, since I don't know where to start looking anymore.

We are using ASP.Net MVC, Entity Framework, and SQL Server 2008 over Windows Server 2008. The machine is a dedicated box we got from a not exactly top-tier provider, so things might be configured non-optimally, or who knows what else.
The box is also pretty small, and has only 1Gb RAM, but it should take the kind of load we have for now...

I'm copying the full call stack below, but first, some of the things we know:

  • The error always happens when iTunes is querying our site. I believe this should have nothing to do with anything, but the truth is that we only get it from iTunes. My best guess is that this happens because only iTunes queries us at that time of the night when no one else is hitting us.
  • One of our theories is that the SQL Server and IIS are fighting for memory, and one of them is getting paged to disk out of not being used, and when someone "wakes it up", it takes too long to read everything from disk back into memory. Is this something that could potentially happen? (I'm kind of discarding this since it sounds like a design issue in SQL Server if it were possible)
  • I also thought about the possibility that we're leaking connections, since we may not be disposing of EF entities appropriately (see my question here). This is the only thing I could find by Googling the problem. I'm discarding this given the extremely low load we have.
  • This always happens over the night, so it's very likely something related to the fact that nothing happened for a while. For example, I'm pretty sure that when these requests hit, the web server process got recycled and it's starting up / re-JITting everything. The re-JITting doesn't explain the SQL timeout, though.

UPDATE: We attached a profiler as suggested, and it took quite a while before we had a new exception. This is the new stuff we know:

  • Having the profiler attached enormously reduced the number of errors we got. In fact, after normally getting several per day, we had to wait for 3 or 4 days for this to happen ONCE. Once we stopped the profiler, it went back to the normal error frequency (or even worse). So the profiler has some effect that hides this problem to some extent, but not completely.
  • Looking at the profiler trace next to the IIS requests log, there is an expected 1-1 correspondence between requests and queries. However, every now and then, I see A LOT of queries being executed that have no correllation at all with the IIS log. In fact, right before the actual bug was logged, I got 750 queries in a period of 3 minutes, all of which were completely unrelated to the IIS logs. The query text look like the kind of unreadable crap that EF generates, and they're not all the same, and they all look just like the queries coming from the website: Same ApplicationName, User, etc. To give an idea how ridiculous this is, the site got about 370 IIS requests that hit the DB, in the course of 2 days
  • These unexplained queries did not come from the same ClientProcessID as the previous website ones, although they may still have come from the website, if the process got recycled in the meantime. There was almost an hour of no activity between the last explained query, and the first unexplained one.
  • One of these long streaks of queries that I don't know where they came from came right before the error I got logged, so I believe this is the clue we should be following.
  • As I expected originally, when the query that threw the error was executed, it came from a different ClientProcessID than the previous one, (8 minutes later than the previous unexplained one, and almost exactly one hour later than the previous IIS one). This means, to me, that the worker process had indeed gotten recycled.
  • This is something I absolutely don't understand. The IIS log shows that one minute before the error requests, 4 were perfectly served, although the queries for those don't show up in the trace at all. In fact, after those 4 that went well, I had 4 exceptions thrown in quick succession, those 4 ALSO don't show up in the trace (which makes sense since if there was a Timeout in connection the query should have never gotten executed, but I don't see the connections attempts in the trace either)

So, in short, I'm completely clueless about this. I can't find a reason for those hundreds of queries that get run in quick succession, but I believe those must have something to do with the problem.
I also don't know how to diagnose the connection problems...
Or how the Profiler trace may be missing some queries that according to IIS went through fine...

Any ideas?


This is the exception information:

System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

System.Data.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
   at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.SqlClient.SqlConnection.Open()
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   --- End of inner exception stack trace ---
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.Open()
   at System.Data.Objects.ObjectContext.EnsureConnection()
   at System.Data.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Objects.ObjectQuery`1.System.Collections.Generic.IEnumerable<T>.GetEnumerator()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at System.Data.Objects.ELinq.ObjectQueryProvider.<GetElementFunction>b__1[TResult](IEnumerable`1 sequence)
   at System.Data.Objects.ELinq.ObjectQueryProvider.ExecuteSingle[TResult](IEnumerable`1 query, Expression queryRoot)
   at System.Data.Objects.ELinq.ObjectQueryProvider.System.Linq.IQueryProvider.Execute[S](Expression expression)
   at System.Linq.Queryable.FirstOrDefault[TSource](IQueryable`1 source)
   at MyProject.Controllers.SitesController.Feed(Int32 id) in C:\...\Controller.cs:line 38
   at lambda_method(ExecutionScope , ControllerBase , Object[] )
   at System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClassa.<InvokeActionMethodWithFilters>b__7()
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation)
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName)
   at System.Web.Mvc.Controller.ExecuteCore()
   at System.Web.Mvc.MvcHandler.ProcessRequest(HttpContextBase httpContext)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Any ideas will be enormously appreciated.

解决方案

Not Enough Memory

This is very likely a Memory problem, perhaps aggravated or triggered by other things, but still inherently a memory problem. there are two other (less likely) possibilities, that you should check and eliminate first (because it is easy to do so):

Easy To Check Possibilities:

  1. You may have "Auto Close" enabled: Auto Close can have exactly this behavior, however it is rare for it to be turned on. To check this, in SSMS right-click on your application database, select "Properties", and then select the "Options" pane. Look at the "Auto Close" entry and make sure that it is set to False. Check tempdb also.

  2. SQL Agent Jobs may be causing it: Check the Agent's History Log to see if there were any jobs consistently running during the events. Remember to check maintenance jobs too, as things like Rebuilding Indexes are frequently cited as performance problems while they are running. These are unlikely candidates now, only because they would not normally be affected by the Profiler.

Why It Looks Like a Memory Problem:

If those do not show anything, then you should check for memory problems. I suspect Memory as the cause in your case because:

  • You have 1 GB of Memory: Although this is technically above the Minimum for SQL Server, it is way below the recommended for SQL Server, and way below what in my experience is acceptable for production, even for a lightly loaded server.

  • You are running IIS and SQL Server on the same box: This is not recommended by itself, in large part because of the contention for memory that results, but with only 1 GB of memory it results in IIS, the app, SQL Server, the OS and any other tasks and/or maintenance all fighting for very little memory. The way the Windows manages this is to give memory to the active processes by aggressively taking it away from the non-active processes. It can take many seconds, or even minutes for a large process like SQL Server to get back enough of its memory to be able to completely service a request in this situation.

  • Profiler made 90% of the problem go away: This is a big clue that memory is likely the problem, because typically, things like Profiler have exactly this effect on this particular problem: the Profiler task keeps the SQL Server just a little bit active all of the time. Frequently, this is just enough activity to either keep it off the OS's "scavenger" list, or at least reduces it's impact somewhat.

How to Check For Memory as the Culprit:

  1. Turn Off the Profiler: Its having a Heisenberg effect on the problem, so you have to turn it off or you will not be able to see the problem reliably.

  2. Run a System Monitor (perfmon.exe) from another box, that remotely connects to the perfomrance collection service on the box that your SQL Server and IIS are running on. you can most easily do this by first removing the three default stats (they are local only), and then add in the needed stats (below), but make sure to change the Computer name in the first drop-down to connect to your SQL box.

  3. Send the collected data to a file by creating a "Counter Log" on perfmon. If you are unfamiliar with this, then the easiest thing to do is probably to collect the data to a tab or comma separated file that you can open with Excel to analyze.

  4. Set up your perfmon to collect to a file and add the following counters to it:

    -- Processor\%Processor Time[Total]

    -- PhysicalDisk\% Idle Time[for each disk]

    -- PhysicalDisk\Avg. Disk Queue Length[for each disk]

    -- Memory\Pages/sec

    -- Memory\Page Reads/sec

    -- Memory\Available MBytes

    -- Network Interface\Bytes Total/sec[for each interface in use]

    -- Process\% Processor Time[see below]

    -- Process\Page Faults/sec[see below]

    -- Process\Working Set [see below]

  5. For the Process counters (above) you want to include the sqlserver.exe process, any IIS processes, and any stable application processes. Note that this will ONLY work for "stable" processes. Processes that are continually being re-created as needed, cannot be captured this way because there is no way to specify them before they exist.

  6. Run this collection to a file during the time that the problem most frequently happens. Set the collection interval to something close to 10-15 secs. (this collects a lot of data, but you will need this resolution to pick out the separate events).

  7. After you have one or more incidents, stop the collection and then open your colleced data file with Excel. You will probably have to reformat the timestamp column to be usefully visible and show hours minutes and seconds. Use your IIS log to find the exact time of the incidents, then look at the perfmon data to see what was going on before and after the incident. In particular you want to see if its working set was small before and was large after, with a lot of page faulting in between. That's the clearest sign of this problem.

SOLUTIONS:

Either separate IIS and SQL Server onto two different boxes (preferred) or else add more memory to the box. I would think that 3-4 GB should be a minimum.

What About That Weird EF Stuff?

The problem here is that it is most likely either peripheral or only contributory to your main problem. Remember that Profiler made 90% of your incidents go away, so what remains, may be a different problem, or it may be only the most extreme aggravator of the problem. Because of its behavior I would guess that it is either cycling its cache or there is some other background maintenance of the application server processes.

这篇关于帮助故障排除SqlException:在非负载情况下,超时在连接上过期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-17 11:30
查看更多