[原创文章不易,转载请注明出处链接]

继续看看path&assoc的断开和恢复管理。

        

二.    Manage transport andassociation

偶联的多归属管理主要针对transport,但多个transport/path的断开必然会倒致association也断开。所以追踪path的更新、断开和恢复,也离不开assoc的断开和恢复管理。

每个path的传送失败(即收不到SACK),除了本path出错计数外,assoc的出错计数器也要递增。除了primary transport(主通路)在传送DATA期间外,在primarytransport闲时和alternatetransport(备用通路)上,一般是通过发送HeartBeat来检测链路状态。

pathassoc的出错计数器分别如下:

transport->error_count和asoc->overall_error_count

pathassoc的几种状态分别如下:

pathstate0-Unactive,1-Active,2-Unconfirm。

assocstate:0~max,4是已建立。/proc/net/sctp/assoc中“ST”项表示偶联状态。

1、何时更新path

操作函数sctp_assoc_control_transport        #net/sctp/associola.c

操作对象asoc->primary_path

          asoc->active_path

          asoc->retran_path

操作类型up / down

(1).       SCTP_TRANSPORT_UP点:sctp_check_transmitted,sct_cmd_transport_on

(2).       SCTP_TRANSPORT_DOWN点:sctp_do_8_2_transport_strike  #可能会更新active_path

2、DOWN:何时断开path

path重传次数超过最大值(可通过/proc/sys/net/sctp/path_max_retrans设置),path通路断开。

操作函数sctp_do_8_2_transport_strike,实现源码如下所示:

点击(此处)折叠或打开

  1.        /* The check for association's overall error counter exceeding the
  2.         * threshold is done in the state function.
  3.         */
  4.        /* We are here due to a timer expiration. If the timer was
  5.         * not a HEARTBEAT, then normal error tracking is done.
  6.         * If the timer was a heartbeat, we only increment error counts
  7.         * when we already have an outstanding HEARTBEAT that has not
  8.         * been acknowledged.
  9.         * Additionally, some tranport states inhibit error increments.
  10.         */
  11.        if (!is_hb) {
  12.               asoc->overall_error_count++;
  13.               if (transport->state != SCTP_INACTIVE)
  14.                      transport->error_count++; //传送失败次数统计,下同
  15.         } else if (transport->hb_sent) {
  16.               if (transport->state != SCTP_UNCONFIRMED)
  17.                      asoc->overall_error_count++;
  18.               if (transport->state != SCTP_INACTIVE)
  19.                      transport->error_count++;
  20.        }
  21. //。。。(略),SCTP_PF状态处理
  22.        if (transport->state != SCTP_INACTIVE &&
  23.            (transport->error_count > transport->pathmaxrxt)) { //通路失败次数比较
  24.               SCTP_DEBUG_PRINTK_IPADDR("transport_strike:association %p",
  25.                                     " transport IP: port:%d failed.\n",
  26.                                     asoc,
  27.                                     (&transport->ipaddr),
  28.                                     ntohs(transport->ipaddr.v4.sin_port));
  29.               sctp_assoc_control_transport(asoc, transport,
  30.                                         SCTP_TRANSPORT_DOWN, //通路断开
  31.                                         SCTP_FAILED_THRESHOLD);
  32.        }


sctp_do_8_2_transport_strike这个函数何时被调用(都在sctp_cmd_interpreter)

(1).       SCTP_CMD_STRIKE  -> sctp_do_8_2_transport_strike

触发点sctp_sf_do_6_3_3_rtx,sctp_sf_t2_timer_expire, sctp_sf_t4_timer_expire

(2).       SCTP_CMD_TRANSPORT_RESET  -> sctp_cmd_transport_reset  -> sctp_do_8_2_transport_strike

触发点sctp_sf_sendbeat_8_3,sctp_sf_do_prm_requestheartbeat

3、UP:何时清掉transport->error_count,表明path恢复正常

(1).       sctp_cmd_interpreter(SCTP_CMD_UPDATE_ASSOC)  -> sctp_assoc_update  ->  sctp_transport_reset

(2).       sctp_cmd_interpreter(SCTP_CMD_PROCESS_SACK)  -> sctp_cmd_process_sack  ->  sctp_outq_sack  -> sctp_check_transmitted        //收到SACK

(3).       sctp_cmd_interpreter(SCTP_CMD_TRANSPORT_ON)  -> sctp_cmd_transport_on

4、何时断开偶联

assoc重传次数超过最大值(可通过/proc/sys/net/sctp/association_max_retrans设置),偶联断开。

操作函数sctp_sf_do_6_3_3_rtx, sctp_sf_sendbeat_8_3, sctp_sf_t4_timer_expire等。

sctp_sf_do_6_3_3_rtx为例:

点击(此处)折叠或打开

  1.         if (asoc->overall_error_count >= asoc->max_retrans) { //偶联失败次数判断
  2.               if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING) {
  3.                      /*
  4.                       * We are here likely because the receiver had its rwnd
  5.                       * closed for a while and we have not been able to
  6.                       * transmit the locally queued data within the maximum
  7.                       * retransmission attempts limit. Start the T5
  8.                       * shutdown guard timer to give the receiver one last
  9.                       * chance and some additional time to recover before
  10.                       * aborting.
  11.                       */
  12.                      sctp_add_cmd_sf(commands, SCTP_CMD_TIMER_START_ONCE,
  13.                             SCTP_TO(SCTP_EVENT_TIMEOUT_T5_SHUTDOWN_GUARD));
  14.               } else {
  15.                      sctp_add_cmd_sf(commands, SCTP_CMD_SET_SK_ERR,
  16.                                    SCTP_ERROR(ETIMEDOUT));
  17.                      /* CMD_ASSOC_FAILED calls CMD_DELETE_TCB. */
  18.                      sctp_add_cmd_sf(commands, SCTP_CMD_ASSOC_FAILED, //偶联断开
  19.                                    SCTP_PERR(SCTP_ERROR_NO_ERROR));
  20.                      SCTP_INC_STATS(net, SCTP_MIB_ABORTEDS);
  21.                      SCTP_DEC_STATS(net, SCTP_MIB_CURRESTAB);
  22.                      return SCTP_DISPOSITION_DELETE_TCB;
  23.               }
  24.        }


5、何时清掉asoc->overall_error_count,表明偶联恢复正常

(1).       sctp_cmd_interpreter(SCTP_CMD_UPDATE_ASSOC)  -> sctp_assoc_update

(2).       sctp_cmd_interpreter(SCTP_CMD_PROCESS_SACK)  -> sctp_cmd_process_sack  ->  sctp_outq_sack  -> sctp_check_transmitted        //收到SACK

(3).       sctp_cmd_interpreter(SCTP_CMD_TRANSPORT_ON)  -> sctp_cmd_transport_on

(4).       sctp_cmd_interpreter(SCTP_CMD_GEN_SHUTDOWN)

PS:鉴于SCTP代码的相对稳定,如果不是特别说明,所分析源码的内核版本是2.6.21
      另外吐槽一下,要是CU博客编辑支持直接从WORD中拷贝过来就好了,省得每次重新设置格式,难道非得截图。。。




10-27 13:30