问题描述
背景:云
我们有我们通常承载我们自己的服务器一个基于Java的Web应用程序。最近,我们使用亚马逊网络服务(AWS EC2)云主持一个实例。
We have a java-based web application that we normally host on our own servers. Recently we used Amazon Web Services (AWS EC2) cloud to host an instance.
这云安装符合我们的典型的现场的设置:一台服务器的应用服务器,其他服务器上的数据库服务器。 (几个应用程序服务器指向同一个数据库服务器)
This "cloud setup" matches our typical "on site" setup: one server for the app server, another server for the database server. (Several app servers point to the same database server)
问题在这个云的设置,我们收到的数据库和JDBC驱动程序,其中,在(貌似)随机间隔,随机点在codeBase的,数据库连接失败之间的间歇性通过对错误重置连接。
The problemIn this cloud setup, we receive intermittent "connection reset by peer errors" between the database and the jdbc driver, where at (seemingly) random intervals and at random points in the codebase, the database connection fails.
下面是一些错误摘要日志
Here are a few error excerpts for the log
堆栈跟踪示例1:
at com.participate.pe.genericdisplay.client.taglib.GenDisplayViewTag.doStartTag(GenDisplayViewTag.java:77)
... 75 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:304)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getMetaData(SQLServerConnection.java:1734)
at org.jboss.resource.adapter.jdbc.WrappedConnection.getMetaData(WrappedConnection.java:354)
堆栈跟踪示例2
at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4437)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4389)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1457)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:1462)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.setAutoCommit(SQLServerConnection.java:1610)
at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.checkTransaction(BaseWrapperManagedConnection.java:429)
的技术环境
- 在Jboss 4.2.2.GA(JBoss的Web 2.0的/ Tomcat的6)
- 在MSSQL 2005年2.0 JDBC驱动程序
的几点
- 我们拥有的永远的看到了这样的问题我们自己的环境(即自己的数据中心)运行的应用程序了好几年
- 这使我得出结论:一些有趣的是与亚马逊的网络环境怎么回事。我可能是错/失去了一些东西在/ etc。
- 在此问题只发生在我们的应用程序。我们还有其他的Java和PHP应用场合都没有这个问题。其它Java应用程序使用不同的JDBC驱动程序(JTDS,据我所知)
- 这似乎不是一个简单的连接超时
- We have never seen this problem inour own environment (i.e. own data centers) running the application for several years
- This led me to conclude "something funny is going on with Amazon network environment". I may be wrong/missing something/etc.
- This problem only occurs with our application. We have other java and php applications which have not had this problem. The other java application uses a different jdbc driver (jtds, afaik)
- It doesn't seem like a simple connection timeout
问题
都具有一个没有人见过这个? - 如果这是一个EC2已知的问题,我们可以配置解决这个问题我们的方式(即确保一切都在自己的子网或虚拟私有云(VPC)? - 任何JDBC驱动程序设置,以获得过去的这个问题呢?
-Has anyone seen this before?-If it's an EC2 "known issue", can we configure our way around the problem (i.e. make sure everything is on its own subnet or virtual private cloud (vpc) ?-Any jdbc driver settings to get past this problem?
**更新**我已经扩展和增加赏金对这一问题。
** Update **I've extended and increased the bounty on this question.
在额外的信息位:两个虚拟服务器(数据库和应用服务器)是不同的子网 - 即在两个服务器之间的一跳。
On extra bit of information: the two virtual servers (database and application server) were on different subnets--i.e. one hop between the two servers.
在非云环境中,我们有零跳bewtewn两个服务器。
In a non-cloud environment we have "zero hops" bewtewn the two servers.
我们的托管管理员说,我们已经在我们的EC2实例的子网无法控制。这使我想知道如果虚拟私有云将帮助。
Our hosting admins said we had no control over the subnets of our EC2 instances. This made me wonder if virtual private cloud would help.
在此先感谢
将
推荐答案
谨慎的usind DBCP /连接池的功能只是一个字来缓解这个问题 - 越多,你能'testOnBorrow等功能,更可以引入延迟或其他性能变化影响系统上。我不知道是否DBCP仍然没有这种与否,但在几年前,它会产生实际测试查询来测试连接 - 完整的堆栈,数据库的反应 - 不只是在网络层。从布赖恩·上面的链接带回周围重试逻辑JDBC连接管理,从2000年代初的可怕回忆。
Just a word of caution on usind DBCP/connection pool features to mitigate the issue - the more you enable 'testOnBorrow' and other features, the more you can introduce latency or other performance changing affects on the system. I don't know if DBCP still does this or not, but a few years ago it would generate actual test queries to test the connection - full stack, database responses - not just at the network layer. The above link from Brian brings back horrific memories from the early 2000s on surrounding re-try logic for JDBC connection management.
不管怎么说,这是很难真正根源这一点,除了收集证据,消除了看似随意的'一组特定的条件:
Anyway, it's tough to really root cause this, other than gather evidence and eliminate the 'seemingly random' to a specific set of conditions:
-
您可以尝试扔了一个Wireshark的/ PCAP跟踪,发现当它发生,并将结果发送到亚马逊和微软,看看他们是否能根本原因是
You could try to throw up a Wireshark/PCAP trace, find when it happens, and send the results to both Amazon and Microsoft to see if they can root cause it
您可以尝试上述某些测试工具,以找出问题(JMeter的测试,以获得并发了),反弹的网络连接,看恢复,等等
You could try the above with certain test harnesses to isolate the problem (JMeter tests to get concurrency up), bounce the network connection, watch for recovery, etc
您可以尝试的SQL Server的替代版本,以贴现的是,已被固定在SQL Server / JDBC驱动程序的bug。
You could try alternative versions of SQL Server to discount a SQL Server/JDBC driver bug that has since been fixed.
如果DNS被用于连接字符串,可以使用IP地址来验证Nslookup的问题
If DNS is used in connection strings, could use IP addresses to validate nslookup issues
我不是一个SQL Server专家,但另一条路线进行研究可能会在相关产品领域 - 如:看看有没有人遇到过类似的问题,TFS /共享点(例如,如<一个href="http://nickhoggard.word$p$pss.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/" rel="nofollow">http://nickhoggard.word$p$pss.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )
I'm not a SQL Server expert, but another route for research could be within the related products domain - e.g. see if anyone experienced similar issues with TFS/Sharepoint (e.g. such as http://nickhoggard.wordpress.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )
这篇关于SQL Server的JDBC连接重置错误:只有在Amazon EC2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!