最近生产环境出现502 报警较多,通过排查问题,有些问题还挺有意思。通过分析nginx 源码,对查nginx 状态码来源可能会带来一定启发。本文基于1.6.2(主要是和生成环境对齐)。
首先常见的错误码,定义在ngx_http_request.h, 这里有部分是client 引起的,有部分是upstream 引的,到底在什么情况下会引起下面这些问题?查问题从哪些方面入手?
#define NGX_HTTP_CLIENT_CLOSED_REQUEST 499
#define NGX_HTTP_INTERNAL_SERVER_ERROR 500
#define NGX_HTTP_NOT_IMPLEMENTED 501
#define NGX_HTTP_BAD_GATEWAY 502
#define NGX_HTTP_SERVICE_UNAVAILABLE 503
#define NGX_HTTP_GATEWAY_TIME_OUT 504
#define NGX_HTTP_INSUFFICIENT_STORAGE 507
access.log 会打req 的status, 需要去查status 赋值逻辑。
grep -r status= src|grep 502
后端状态码5xx 的逻辑基本在ngx_http_upstream.c 的 ngx_http_upstream_next 中,这里是状态码的
switch(ft_type) {
case NGX_HTTP_UPSTREAM_FT_TIMEOUT:
status = NGX_HTTP_GATEWAY_TIME_OUT;
break;
case NGX_HTTP_UPSTREAM_FT_HTTP_500:
status = NGX_HTTP_INTERNAL_SERVER_ERROR;
break;
case NGX_HTTP_UPSTREAM_FT_HTTP_403:
status = NGX_HTTP_FORBIDDEN;
break;
case NGX_HTTP_UPSTREAM_FT_HTTP_404:
status = NGX_HTTP_NOT_FOUND;
break;
这里ft_type 和 status 有个对应关系,这里ft_error NGX_HTTP_UPSTREAM_FT_TIMEOUT 跟504 ,NGX_HTTP_UPSTREAM_FT_HTTP_500 和 500 等有一对一对应关系,其他的ft type 都使用502 。这里就需要具体查下ft 的赋值情况。
#define NGX_HTTP_UPSTREAM_FT_ERROR 0x00000002
#define NGX_HTTP_UPSTREAM_FT_TIMEOUT 0x00000004
#define NGX_HTTP_UPSTREAM_FT_INVALID_HEADER 0x00000008
#define NGX_HTTP_UPSTREAM_FT_HTTP_500 0x00000010
#define NGX_HTTP_UPSTREAM_FT_HTTP_502 0x00000020
#define NGX_HTTP_UPSTREAM_FT_HTTP_503 0x00000040
#define NGX_HTTP_UPSTREAM_FT_HTTP_504 0x00000080
#define NGX_HTTP_UPSTREAM_FT_HTTP_403 0x00000100
#define NGX_HTTP_UPSTREAM_FT_HTTP_404 0x00000200
#define NGX_HTTP_UPSTREAM_FT_UPDATING 0x00000400
#define NGX_HTTP_UPSTREAM_FT_BUSY_LOCK 0x00000800
#define NGX_HTTP_UPSTREAM_FT_MAX_WAITING 0x00001000
#define NGX_HTTP_UPSTREAM_FT_NOLIVE 0x40000000
#define NGX_HTTP_UPSTREAM_FT_OFF 0x80000000
504, NGX_HTTP_GATEWAY_TIME_OUT 在ngx_http_upstream.c 中有几处会赋值,
- 第一处是ngx_http_upstream_process_upgraded,
if (downstream->write->timedout) {
c->timedout = 1;
ngx_connection_error(c, NGX_ETIMEDOUT, "client timed out");
ngx_http_upstream_finalize_request(r, u, NGX_HTTP_REQUEST_TIME_OUT);
return;
}
if (upstream->read->timedout || upstream->write->timedout) {
ngx_connection_error(c, NGX_ETIMEDOUT, "upstream timed out");
ngx_http_upstream_finalize_request(r, u, NGX_HTTP_GATEWAY_TIME_OUT);
return;
}
- 第二处是 ngx_http_upstream_process_non_buffered_upstream
ngx_connection_t *c;
c = u->peer.connection;
ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0,
"http upstream process non buffered upstream");
c->log->action = "reading upstream";
if (c->read->timedout) {
ngx_connection_error(c, NGX_ETIMEDOUT, "upstream timed out");
ngx_http_upstream_finalize_request(r, u, NGX_HTTP_GATEWAY_TIME_OUT);
return;
}
ngx_http_upstream_process_non_buffered_request(r, 0);
- 第三处是ngx_http_upstream_process_body_in_memory
c = u->peer.connection;
rev = c->read;
ngx_log_debug0(NGX_LOG_DEBUG_HTTP, c->log, 0,
"http upstream process body on memory");
if (rev->timedout) {
ngx_connection_error(c, NGX_ETIMEDOUT, "upstream timed out");
ngx_http_upstream_finalize_request(r, u, NGX_HTTP_GATEWAY_TIME_OUT);
return;
}
三处都是从upstream 中取连接,然后读或者写超时,可以看出504 的主要主要原因,是读写下游超时。
503 ,NGX_HTTP_SERVICE_UNAVAILABLE , grep 下就可以发现,主要是在limit 限流模块会出现,
grep NGX_HTTP_SERVICE_UNAVAILABLE -r src
src/http/modules/ngx_http_limit_req_module.c: NGX_HTTP_SERVICE_UNAVAILABLE);
src/http/modules/ngx_http_limit_conn_module.c: NGX_HTTP_SERVICE_UNAVAILABLE);
源码可以比较清晰看出来通过 ngx_http_limit_req_merge_conf 这里重置了状态码,而ngx_http_limit_req_merge_conf 会再 ngx_http_limit_conn_handler 中调用,这里限流被命中则返回503
static ngx_int_t
ngx_http_limit_conn_handler(ngx_http_request_t *r)
{
...
if (r->main->limit_conn_set) {
return NGX_DECLINED;
}
lccf = ngx_http_get_module_loc_conf(r, ngx_http_limit_conn_module);
limits = lccf->limits.elts;
for (i = 0; i < lccf->limits.nelts; i++) {
//处理每一条limit_conn策略
}
return NGX_DECLINED;
}
502 相对比较复杂点,出现情况比较多。grep 502 , NGX_HTTP_BAD_GATEWAY 等实现,
1,可以看出ngx_resolve_start 在 resolve 阶段,resolve 失败会NGX_HTTP_BAD_GATEWAY
2, upstream->read/write 遇到eof / 0 /error 的时候会NGX_HTTP_BAD_GATEWAY, recv 系统调用返回n, 大于0时是读写字节数, 在接受到fin 的时候会返回0, 其他错误的时候返回-1。这里常见的一种错就是,nginx 的下游挂了,会返回给上游一个fin,然后502 返回给client。
3,在upstream 连接阶段,ngx_http_upstream_connect 连接下游失败报错会 传 NGX_HTTP_UPSTREAM_FT_ERROR 给ngx_http_upstream_next 。
rc = ngx_event_connect_peer(&u->peer);
ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
"http upstream connect: %i", rc);
if (rc == NGX_ERROR) {
ngx_http_upstream_finalize_request(r, u,
NGX_HTTP_INTERNAL_SERVER_ERROR);
return;
}
u->state->peer = u->peer.name;
if (rc == NGX_BUSY) {
ngx_log_error(NGX_LOG_ERR, r->connection->log, 0, "no live upstreams");
ngx_http_upstream_next(r, u, NGX_HTTP_UPSTREAM_FT_NOLIVE);
return;
}
if (rc == NGX_DECLINED) {
ngx_http_upstream_next(r, u, NGX_HTTP_UPSTREAM_FT_ERROR);
return;
}
- 4 当是无效的header 的时候,NGX_HTTP_UPSTREAM_FT_INVALID_HEADER 会传给 ngx_http_upstream_next
if (u->buffer.last == u->buffer.end) {
ngx_log_error(NGX_LOG_ERR, c->log, 0,
"upstream sent too big header");
ngx_http_upstream_next(r, u,
NGX_HTTP_UPSTREAM_FT_INVALID_HEADER);
return;
}
499 相对而言就比较简单了, NGX_HTTP_CLIENT_CLOSED_REQUEST 在client 访问nginx 时,如果主动close 了,nginx 就会记录 499,这个状态码不会返回给client,只本地记录。