我们有以下设置:
RackspaceCloud 8GB实例上的Ubuntu Linux 12.04LTE上的Redis 2.6具有以下设置:
daemonize yes
pidfile /var/run/redis_6379.pid
port 6379
timeout 300
loglevel notice
logfile /var/log/redis_6379.log
databases 16
save 900 1
save 300 10
save 60 10000
rdbcompression yes
dbfilename dump.rdb
dir /var/redis/6379
requirepass PASSWORD
maxclients 10000
maxmemory 7gb
maxmemory-policy allkeys-lru
maxmemory-samples 3
appendonly no
slowlog-log-slower-than 10000
slowlog-max-len 128
activerehashing yes
我们的App服务器托管在RackSpace Managed中,并通过公共(public)IP连接到Redis(以避免设置RackSpace Connect,这是一个皇家PITA),并且我们通过为Redis连接输入密码来提供一定的安全性。我手动将unix文件描述符限制增加到10240,最大10k连接应该提供足够的净空。从上面的设置文件中可以看到,我将内存使用量限制为7GB,以保留一些RAM余量。
我们使用ServiceStack C#Redis驱动程序。我们使用以下web.config设置:
<RedisConfig suffix="">
<Primary password="PASSWORD" host="HOST" port="6379" maxReadPoolSize="50" maxWritePoolSize="50"/>
</RedisConfig>
我们有一个PooledRedisClientManager单例,每个AppPool创建一次,如下所示:
private static PooledRedisClientManager _clientManager;
public static PooledRedisClientManager ClientManager
{
get
{
if (_clientManager == null)
{
try
{
var poolConfig = new RedisClientManagerConfig
{
MaxReadPoolSize = RedisConfig.Config.Primary.MaxReadPoolSize,
MaxWritePoolSize = RedisConfig.Config.Primary.MaxWritePoolSize,
};
_clientManager = new PooledRedisClientManager(new List<string>() { RedisConfig.Config.Primary.ToHost() }, null, poolConfig);
}
catch (Exception e)
{
log.Fatal("Could not spin up Redis", e);
CacheFailed = DateTime.Now;
}
}
return _clientManager;
}
}
然后,我们获取一个连接,并执行如下放置/获取操作:
using (var client = ClientManager.GetClient())
{
client.Set<T>(region + key, value);
}
代码似乎可以正常工作。假设我们有约20个AppPools和50-100个读写客户端,以及50-100个写入客户端,则我们期望最多与Redis服务器建立2000-4000个连接。但是,我们始终在错误日志中看到以下异常,通常是数百个错误 bundle 在一起,一个小时都没有,然后又一次出现了恶心。
System.IO.IOException: Unable to read data from the transport connection:
An existing connection was forcibly closed by the remote host.
---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at
System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags) at
System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace
- at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size) at System.IO.BufferedStream.ReadByte() at
ServiceStack.Redis.RedisNativeClient.ReadLine() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 85 at
ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at
ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96
我们已经尝试将Redis服务器超时设置为0(即无连接超时),将超时设置为24小时,并且在此之间没有运气。谷歌搜索和Stackoverflowing并没有带来真正的答案,所有事情似乎都指向我们至少在做正确的事情。
我们的感觉是,在Rackspace Hosted和Rackspace Cloud上经常遇到持续的网络延迟问题,这些问题会导致大量TCP连接过时。我们可以通过实现客户端连接超时来解决该问题,问题是我们是否还需要服务器端超时。但这只是一种感觉,我们不能100%地确定自己走在正确的轨道上。
有想法吗?
编辑:我偶尔也看到以下错误:
ServiceStack.Redis.RedisException: Unable to Connect: sPort: 65025 ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.Socket.Send(IList`1 buffers, SocketFlags socketFlags) at ServiceStack.Redis.RedisNativeClient.FlushSendBuffer() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 273 at ServiceStack.Redis.RedisNativeClient.SendCommand(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 203 --- End of inner exception stack trace --- at ServiceStack.Redis.RedisNativeClient.CreateConnectionError() in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 165 at ServiceStack.Redis.RedisNativeClient.SendExpectData(Byte[][] cmdWithBinaryArgs) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 355 at ServiceStack.Redis.RedisNativeClient.GetBytes(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisNativeClient.cs:line 404 at ServiceStack.Redis.RedisClient.GetValue(String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.cs:line 185 at ServiceStack.Redis.RedisClient.Get[T](String key) in C:\src\ServiceStack.Redis\src\ServiceStack.Redis\RedisClient.ICacheClient.cs:line 32 at DataPeaks.NoSQL.RedisCacheClient.Get[T](String key) in c:\dev\base\branches\currentversion\DataPeaks\DataPeaks.NoSQL\RedisCacheClient.cs:line 96
我想这是服务器端连接超时未在客户端上处理的直接结果。看起来我们确实需要处理客户端连接超时。
最佳答案
我们认为,在仔细阅读Redis文档并找到这种美感(http://redis.io/topics/persistence)之后,我们找到了根本原因:
RDB needs to fork() often in order to persist on disk using a child process.
Fork() can be time consuming if the dataset is big, and may result in Redis
to stop serving clients for some millisecond or even for one second if the
dataset is very big and the CPU performance not great. AOF also needs to fork()
but you can tune how often you want to rewrite your logs without any trade-off
on durability.
我们关闭了RDB持久性,此后再也没有看到这些连接断开。