SO_KEEPALIVE选项 

  对于面向连接的TCP socket,在实际应用中通常都要检测对端是否处于连接中,连接端口分两种情况:
  1、连接正常关闭,调用close() shutdown()连接优雅关闭,send与recv立马返回错误,select返回SOCK_ERR;
  2、连接的对端异常关闭,比如网络断掉,突然断电.

对于第二种情况,判断连接是否断开的方法有一下几种:
1、自己编写心跳包程序,简单的说就是自己的程序加入一条线程,定时向对端发送数据包,查看是否有ACK,根据ACK的返回情况来管理连接。此方法比较通用,一般使用业务层心跳处理,灵活可控,但改变了现有的协议;

2、使用TCP的keepalive机制,UNIX网络编程不推荐使用SO_KEEPALIVE来做心跳检测。

keepalive原理:TCP内嵌有心跳包,以服务端为例,当server检测到超过一定时间(/proc/sys/net/ipv4/tcp_keepalive_time 7200 即2小时)没有数据传输,那么会向client端发送一个keepalive packet,此时client端有三种反应:
1、client端连接正常,返回一个ACK.server端收到ACK后重置计时器,在2小时后在发送探测.如果2小时内连接上有数据传输,那么在该时间的基础上向后推延2小时发送探测包;
2、客户端异常关闭,或网络断开。client无响应,server收不到ACK,在一定时间(/proc/sys/net/ipv4/tcp_keepalive_intvl 75 即75秒)后重发keepalive packet, 并且重发一定次数(/proc/sys/net/ipv4/tcp_keepalive_probes 9 即9次);
3、客户端曾经崩溃,但已经重启.server收到的探测响应是一个复位,server端终止连接。

如果我们不能接受如此之长的等待时间,从TCP-Keepalive-HOWTO上可以知道一共有两种方式可以设置,一种是修改内核关于网络方面的 配置参数,另外一种就是SOL_TCP字段的TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT三个选项。

1) The tcp_keepidle parameter specifies the interval of inactivity that causes TCP to generate a KEEPALIVE transmission for an application that requests them. tcp_keepidle defaults to 14400 (two hours).

/*开始首次KeepAlive探测前的TCP空闭时间 */

2) The tcp_keepintvl parameter specifies the interval between the nine retries that are attempted if a KEEPALIVE transmission is not acknowledged. tcp_keepintvl defaults to 150 (75 seconds).
    /* 两次KeepAlive探测间的时间间隔  */

3) The tcp_keepcnt option specifies the maximum number of keepalive probes to be sent. The value of TCP_KEEPCNT is an integer value between 1 and n, where n is the value of the systemwide tcp_keepcnt parameter.

/* 判定断开前的KeepAlive探测次数

int                 keepIdle = 1000;

int                 keepInterval = 10;

int                 keepCount = 10;

Setsockopt(listenfd, SOL_TCP, TCP_KEEPIDLE, (void *)&keepIdle, sizeof(keepIdle));

Setsockopt(listenfd, SOL_TCP,TCP_KEEPINTVL, (void *)&keepInterval, sizeof(keepInterval));

Setsockopt(listenfd,SOL_TCP, TCP_KEEPCNT, (void *)&keepCount, sizeof(keepCount));

Remember that keepalive is not program?related, but socket?related, so if you have multiple sockets, you can handle keepalive for each of them separately.

参考:

1、http://blog.csdn.net/callinglove/article/details/38380673

2、http://blog.chinaunix.net/uid-26575352-id-3483808.html

05-08 08:30