背景
- 问题描述
DBA同学收到qps大量下降的告警,qps从2w下降到1w,然后又自动恢复了。
基于Analysis Report
信息,发现有很多 STATE:Waiting for table flush
的状态,因此我们分析了问题期间的snapshot,将有问题的状态罗列整理如下:
0. Server version: 5.7.23-ctrip-log MySQL Community Server - (GPL)
1. qps 下降的比较多
2. thread_running 比较多
3. STATE:Waiting for table flush 比较多
4. Waiting for table flush 的表: 同一个表
5. 此时间段有定时任务: Analyze table 操作
基于以上,我们很快便怀疑和Analyze table 有关
- 为什么要Analyze table
当优化器决定选择使用哪个索引来查询时,它将使用该表存储引擎提供的统计信息。如果统计信息不是最新的,那么优化器在创建执行计划时可能会选择错误的索引,这会导致性能下降。
为了防止这种情况发生,存储引擎支持自动和手动的统计信息更新。虽然自动统计信息更新通常工作得很好,但在某些情况下它们不能正确地完成工作。
MySQL对于索引信息统计是通过采样统计的方法,相比整表一条条记录采集来说花费的代价小很多。采样统计的时候,InnoDB默认使用N个page(默认16k大小),统计其中的不同值,然后取平均数,最后乘以这个索引对应的页面数,最后得到索引的基数(cardinality)。
1) 自动更新:
当innodb_stats_persistent=ON
,表示统计信息存储到磁盘,算法如下:采样统计20个16k page(innodb_stats_persistent_sample_pages),当更新的记录数超过1/10的时候,自动更新统计信息;
当innodb_stats_persistent=OFF
,表示统计信息存储到内存,算法如下:采样统计8个16k page(innodb_stats_transient_sample_pages),当更新的记录数超过1/16的时候,自动更新统计信息;
2) 手动更新:
自己写脚本,定期调用Analyze table
3)综合考虑:
我们选择的方法,就是定期更新统计信息。 原因如下:
a)可控制低峰期执行
b)自动更新统计信息,如果没有达到触发条件,则一直不会更新
问题重现与分析
问题重现
上述模拟表明:
- Analyze table mychecksums之前,如果mychecksums table没有被打开使用(m_open_in_progress=0),那么后续query执行成功。
- Analyze table mychecksums之前,如果mychecksums table已经被打开使用了(m_open_in_progress=1),那么后续query会被阻塞。
- 如果将之前引用该表的conn kill掉,那么被阻塞的query立刻执行成功
- Analyze table mychecksums之后,如果mychecksums table被打开使用了(m_open_in_progress=1),那么后续query执行成功。
阻塞堆栈
conn3 select * from mychecksums limit 1
Thread 22 (Thread 0x7f4cbc246700 (LWP 2981)): conn3 select * from mychecksums limit 1
#0 0x00007f4d870c4cf2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000000000c1bf59 in native_cond_timedwait (abstime=0x7f4cbc242700, mutex=0x7f1c27005098, cond=0x7f1c270050c8) at xx/mysql-5.7.23/mysql-5.7.23/include/thr_cond.h:129
#2 my_cond_timedwait (abstime=0x7f4cbc242700, mp=0x7f1c27005098, cond=0x7f1c270050c8) at xx/mysql-5.7.23/mysql-5.7.23/include/thr_cond.h:182
#3 inline_mysql_cond_timedwait (src_file=0x142f2a8 "xx/mysql-5.7.23/mysql-5.7.23/sql/mdl.cc", src_line=1861, abstime=<optimized out>, mutex=0x7f1c27005098, that=<optimized out>) at xx/mysql-5.7.23/mysql-5.7.23/include/mysql/psi/mysql_thread.h:1229
#4 MDL_wait::timed_wait (this=this@entry=0x7f1c27005098, owner=owner@entry=0x7f1c27005000, abs_timeout=abs_timeout@entry=0x7f4cbc242700, set_status_on_timeout=set_status_on_timeout@entry=true, wait_state_name=<optimized out>) at xx/mysql-5.7.23/mysql-5.7.23/sql/mdl.cc:1861
#5 0x0000000000d6d5b4 in TABLE_SHARE::wait_for_old_version (this=this@entry=0x7f1c270b5130, thd=thd@entry=0x7f1c27005000, abstime=abstime@entry=0x7f4cbc242700, deadlock_weight=deadlock_weight@entry=0) at xx/mysql-5.7.23/mysql-5.7.23/sql/table.cc:4611
#6 0x0000000000c71e71 in tdc_wait_for_old_version (deadlock_weight=0, wait_timeout=31536000, table_name=<optimized out>, db=<optimized out>, thd=0x7f1c27005000) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:2927
#7 open_table (thd=thd@entry=0x7f1c27005000, table_list=table_list@entry=0x7f1c2713b578, ot_ctx=ot_ctx@entry=0x7f4cbc242bf0) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:3517
#8 0x0000000000c79402 in open_and_process_table (flags=0, ot_ctx=0x7f4cbc242bf0, has_prelocking_list=false, prelocking_strategy=0x7f4cbc242c80, counter=0x7f1c27007240, tables=0x7f1c2713b578, lex=<optimized out>, thd=0x7f1c27005000) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:5144
#9 open_tables (thd=thd@entry=0x7f1c27005000, start=start@entry=0x7f4cbc242c78, counter=<optimized out>, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f4cbc242c80) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:5755
#10 0x0000000000c79d5a in open_tables_for_query (thd=thd@entry=0x7f1c27005000, tables=tables@entry=0x7f1c2713b578, flags=flags@entry=0) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_base.cc:6530
#11 0x00000000007769fb in execute_sqlcom_select (thd=thd@entry=0x7f1c27005000, all_tables=0x7f1c2713b578) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:5099
#12 0x0000000000ccc3c4 in mysql_execute_command (thd=thd@entry=0x7f1c27005000, first_level=first_level@entry=true) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:2814
#13 0x0000000000cce9dd in mysql_parse (thd=thd@entry=0x7f1c27005000, parser_state=parser_state@entry=0x7f4cbc243d00) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:5554
#14 0x0000000000ccf55d in dispatch_command (thd=thd@entry=0x7f1c27005000, com_data=com_data@entry=0x7f4cbc2445a0, command=COM_QUERY) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:1484
#15 0x0000000000cd0f7f in do_command (thd=thd@entry=0x7f1c27005000) at xx/mysql-5.7.23/mysql-5.7.23/sql/sql_parse.cc:1025
#16 0x0000000000d93030 in handle_connection (arg=arg@entry=0x7f24d193d820) at xx/mysql-5.7.23/mysql-5.7.23/sql/conn_handler/connection_handler_per_thread.cc:308
#17 0x000000000127a504 in pfs_spawn_thread (arg=0x7f24d195dd20) at xx/mysql-5.7.23/mysql-5.7.23/storage/perfschema/pfs.cc:2444
#18 0x00007f4d870c0e25 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f4d85b7d34d in clone () from /lib64/libc.so.6
根因分析
- 复现这个问题,最主要的几点因素是:
1. 长事务
2. 并发事务
3. 在长事务结束之前,跑analyze table
无论长事务还是并发事务,都是为了让 m_open_in_progress=1 的概率提高,产生锁的概率就提高了。
我们这次遇到的问题,就是并发事务多造成的
- 过程
1. 打开一张表,让其m_open_in_progress=1
2. Analyze table mychecksums
2.1 打开并修改表的统计信息
2.2 tdc_remove_table: 让该表从table definition cache中移除,让其失效。移除之前,必须等待之前的引用标记结束。
2.2.1 此时,如果后续有新的query来访问该表,其内存对象版本不一致,导致等待,阻塞发生。(mysql_admin_table-->tdc_remove_table::execute
)(REPAIR TABLE,CHECK TABLE同理)
2.2.2 移除成功
2.2.3 之后访问不受影响
2.3 query_cache.invalidate(thd, table, FALSE)
相关代码
- sql/sql_admin.cc
if (table->table)
{
if (table->table->s->tmp_table)
{
/*
If the table was not opened successfully, do not try to get
status information. (Bug#47633)
*/
if (open_for_modify && !open_error)
table->table->file->info(HA_STATUS_CONST);
}
else if (open_for_modify || fatal_error)
{
tdc_remove_table(thd, TDC_RT_REMOVE_UNUSED,
table->db, table->table_name, FALSE);
/*
May be something modified. Consequently, we have to
invalidate the query cache.
*/
table->table= 0; // For query cache
query_cache.invalidate(thd, table, FALSE);
}
- sql/sql_base.cc
/**
Remove all or some (depending on parameter) instances of TABLE and
TABLE_SHARE from the table definition cache.
@param thd Thread context
@param remove_type Type of removal:
TDC_RT_REMOVE_ALL - remove all TABLE instances and
TABLE_SHARE instance. There
should be no used TABLE objects
and caller should have exclusive
metadata lock on the table.
TDC_RT_REMOVE_NOT_OWN - remove all TABLE instances
except those that belong to
this thread. There should be
no TABLE objects used by other
threads and caller should have
exclusive metadata lock on the
table.
TDC_RT_REMOVE_UNUSED - remove all unused TABLE
instances (if there are no
used instances will also
remove TABLE_SHARE).
TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE -
remove all TABLE instances
except those that belong to
this thread, but don't mark
TABLE_SHARE as old. There
should be no TABLE objects
used by other threads and
caller should have exclusive
metadata lock on the table.
@param db Name of database
@param table_name Name of table
@param has_lock If TRUE, LOCK_open is already acquired
@note It assumes that table instances are already not used by any
(other) thread (this should be achieved by using meta-data locks).
*/
void tdc_remove_table(THD *thd, enum_tdc_remove_table_type remove_type,
const char *db, const char *table_name,
bool has_lock)
{
char key[MAX_DBKEY_LENGTH];
size_t key_length;
TABLE_SHARE *share;
if (! has_lock)
table_cache_manager.lock_all_and_tdc();
else
table_cache_manager.assert_owner_all_and_tdc();
DBUG_ASSERT(remove_type == TDC_RT_REMOVE_UNUSED ||
thd->mdl_context.owns_equal_or_stronger_lock(MDL_key::TABLE,
db, table_name, MDL_EXCLUSIVE));
key_length= create_table_def_key(thd, key, db, table_name, false);
if ((share= (TABLE_SHARE*) my_hash_search(&table_def_cache,(uchar*) key,
key_length)))
{
/*
Since share->ref_count is incremented when a table share is opened
in get_table_share(), before LOCK_open is temporarily released, it
is sufficient to check this condition alone and ignore the
share->m_open_in_progress flag.
Note that it is safe to call table_cache_manager.free_table() for
shares with m_open_in_progress == true, since such shares don't
have any TABLE objects associated.
*/
if (share->ref_count)
{
/*
Set share's version to zero in order to ensure that it gets
automatically deleted once it is no longer referenced.
Note that code in TABLE_SHARE::wait_for_old_version() assumes
that marking share as old and removal of its unused tables
and of the share itself from TDC happens atomically under
protection of LOCK_open, or, putting it another way, that
TDC does not contain old shares which don't have any tables
used.
*/
if (remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE)
share->version= 0;
table_cache_manager.free_table(thd, remove_type, share);
}
else
{
DBUG_ASSERT(remove_type != TDC_RT_REMOVE_NOT_OWN_KEEP_SHARE);
(void) my_hash_delete(&table_def_cache, (uchar*) share);
}
}
if (! has_lock)
table_cache_manager.unlock_all_and_tdc();
}
- sql/sql_base.cc
/**
Check if table's share is being removed from the table definition
cache and, if yes, wait until the flush is complete.
@param thd Thread context.
@param db Database name.
@param table_name Table name.
@param wait_timeout Timeout for waiting.
@param deadlock_weight Weight of this wait for deadlock detector.
@retval false Success. Share is up to date or has been flushed.
@retval true Error (OOM, our was killed, the wait resulted
in a deadlock or timeout). Reported.
*/
static bool tdc_wait_for_old_version(THD *thd, const char *db,
const char *table_name, ulong wait_timeout,
uint deadlock_weight) {
TABLE_SHARE *share;
bool res = false;
mysql_mutex_lock(&LOCK_open);
if ((share = get_cached_table_share(db, table_name)) &&
share->has_old_version()) {
struct timespec abstime;
set_timespec(&abstime, wait_timeout);
res = share->wait_for_old_version(thd, &abstime, deadlock_weight);
}
mysql_mutex_unlock(&LOCK_open);
return res;
}
- sql/sql_base.h
/**
Get an existing table definition from the table definition cache.
Search the table definition cache for a share with the given key.
If the share exists, check the m_open_in_progress flag. If true,
the share is in the process of being opened by another thread,
so we must wait for the opening to finish. This may make the share
be destroyed, if open_table_def() fails, so we must repeat the search
in the hash table. Return the share.
@note While waiting for the condition variable signaling that a
table share is completely opened, the thread will temporarily
release LOCK_open. Thus, the caller cannot rely on LOCK_open
being held for the duration of the call.
@param thd thread descriptor
@param db database name
@param table_name table name
@retval NULL a share for the table does not exist in the cache
@retval != NULL pointer to existing share in the cache
*/
TABLE_SHARE *get_cached_table_share(THD *thd, const char *db,
const char *table_name)
{
char key[MAX_DBKEY_LENGTH];
size_t key_length;
TABLE_SHARE *share= NULL;
mysql_mutex_assert_owner(&LOCK_open);
key_length= create_table_def_key((THD*) 0, key, db, table_name, 0);
while ((share= reinterpret_cast<TABLE_SHARE*>(
my_hash_search(&table_def_cache,
reinterpret_cast<uchar*>(const_cast<char*>(key)),
key_length))))
{
if (!share->m_open_in_progress)
break;
DEBUG_SYNC(thd, "get_cached_share_cond_wait");
mysql_cond_wait(&COND_open, &LOCK_open);
}
return share;
}
- sql/table.cc
/**
Wait until the subject share is removed from the table
definition cache and make sure it's destroyed.
@note This method may access the share concurrently with another
thread if the share is in the process of being opened, i.e., that
m_open_in_progress is true. In this case, close_cached_tables() may
iterate over elements in the table definition cache, and call this
method regardless of the share being opened or not. This works anyway
since a new flush ticket is added below, and LOCK_open ensures
that the share may not be destroyed by another thread in the time
between finding this share (having an old version) and adding the flush
ticket. Thus, after this thread has added the flush ticket, the thread
opening the table will eventually call free_table_share (as a result of
releasing the share after using it, or as a result of a failing
open_table_def()), which will notify the owners of the flush tickets,
and the last one being notified will actually destroy the share.
@param mdl_context MDL context for thread which is going to wait.
@param abstime Timeout for waiting as absolute time value.
@param deadlock_weight Weight of this wait for deadlock detector.
@pre LOCK_open is write locked, the share is used (has
non-zero reference count), is marked for flush and
this connection does not reference the share.
LOCK_open will be unlocked temporarily during execution.
@retval FALSE - Success.
@retval TRUE - Error (OOM, deadlock, timeout, etc...).
*/
bool TABLE_SHARE::wait_for_old_version(THD *thd, struct timespec *abstime,
uint deadlock_weight)
{
MDL_context *mdl_context= &thd->mdl_context;
Wait_for_flush ticket(mdl_context, this, deadlock_weight);
MDL_wait::enum_wait_status wait_status;
mysql_mutex_assert_owner(&LOCK_open);
/*
We should enter this method only when share's version is not
up to date and the share is referenced. Otherwise our
thread will never be woken up from wait.
*/
DBUG_ASSERT(version != refresh_version && ref_count != 0);
m_flush_tickets.push_front(&ticket);
mdl_context->m_wait.reset_status();
mysql_mutex_unlock(&LOCK_open);
mdl_context->will_wait_for(&ticket);
mdl_context->find_deadlock();
DEBUG_SYNC(thd, "flush_complete");
wait_status= mdl_context->m_wait.timed_wait(thd, abstime, TRUE,
&stage_waiting_for_table_flush);
mdl_context->done_waiting_for();
mysql_mutex_lock(&LOCK_open);
m_flush_tickets.remove(&ticket);
if (m_flush_tickets.is_empty() && ref_count == 0)
{
/*
If our thread was the last one using the share,
we must destroy it here.
*/
destroy();
}
DEBUG_SYNC(thd, "share_destroyed");
/*
In cases when our wait was aborted by KILL statement,
a deadlock or a timeout, the share might still be referenced,
so we don't delete it. Note, that we can't determine this
condition by checking wait_status alone, since, for example,
a timeout can happen after all references to the table share
were released, but before the share is removed from the
cache and we receive the notification. This is why
we first destroy the share, and then look at
wait_status.
*/
switch (wait_status)
{
case MDL_wait::GRANTED:
return FALSE;
case MDL_wait::VICTIM:
my_error(ER_LOCK_DEADLOCK, MYF(0));
return TRUE;
case MDL_wait::TIMEOUT:
my_error(ER_LOCK_WAIT_TIMEOUT, MYF(0));
return TRUE;
case MDL_wait::KILLED:
return TRUE;
default:
DBUG_ASSERT(0);
return TRUE;
}
}
问题抽象
生产环境中,会遇到这种场景: 平常SQL语句跑的很快,怎么突然就慢了呢?
资源不足
大家都知道,一台机器的资源是有限的,这里的资源包括但不限于 cpu资源、io资源、thread资源、cache资源 等。 当query突然猛增之后,这些资源就有可能出现不足,从而导致原来很快的SQL性能变差。
这类问题的通用解决方案一般是: 降级、限流
优化器bug
大家都知道优化器会基于最小代价选择一个合适的索引去引擎层获取数据。
这里的代价包含但不限于:Rows-exam、排序、临时表 等等
bug之所以为bug,就是没道理可讲,具体内容就不啰嗦了,简单列几个平常优化过程中的解决方案:
1)删除重复索引
2)添加合适的索引
3)force index : 这种方法不通用,因为对业务侵入性比较大,且上线需要时间
这类问题的通用解决方案一般是: 定期check 重复索引 和 冗余索引
统计信息不准确
- 为什么统计信息会不准确呢
- 因为MySQL是通过采样物理数据页的方式来计算的,当数据分布不均时,就会有统计信息不准的情况
- 另外,MySQL的delete操作,只是将物理记录标记为删除,真正的删除是purge线程干的活。当某些数据因为某种原因(比如:RR事务隔离级别,保证数据一致性),不能被purge的时候,新的insert不是Inplace,而是占用新的空间,这样也会导致统计信息不准确
如果只是基于Rows-exam为成本代价来判断,那么是可以通过Analyze table 更新统计信息来让优化器做出正确的选择。
但是真实场景往往很复杂,目前也没有太好的办法来判断出哪些table的统计信息需要重新更新,只能通过查看该执行计划的rows和真实查询的记录数的差别来简单的判断
解决方案
- 运维层
* 主要原因
- long query
- 监控long query , 发现long query 记录到日志,并后期优化
- high concurrent query
- 白名单机制: 先加入白名单,后优化降低并发度
* 按需check
- 时间维度(全量)
- 每天一次,改为每周一次
- 数量维度(增量)
- 由于没有太好的办法判断哪些表需要Analyze,所以变通的方法是:从slow中找出慢的表,对它进行check
- 源码层
- 因为analyze table,既没有修改table definition,也没有修改table data,那么我们在此操作中直接跳过tdc_remove_table即可
- 已在我们5.7.29分支上合并了这部分代码,主要逻辑如下
* sql/sql_admin.cc
if (table->table)
{
const bool skip_flush=
(operator_func == &handler::ha_analyze)
&& (table->table->file->ha_table_flags() & HA_ONLINE_ANALYZE);
if (table->table->s->tmp_table)
{
/*
If the table was not opened successfully, do not try to get
status information. (Bug#47633)
*/
if (open_for_modify && !open_error)
table->table->file->info(HA_STATUS_CONST);
}
else if ((!skip_flush && open_for_modify) || fatal_error)
{
tdc_remove_table(thd, TDC_RT_REMOVE_UNUSED,
table->db, table->table_name, FALSE);
/*
May be something modified. Consequently, we have to
invalidate the query cache.
*/
table->table= 0; // For query cache
query_cache.invalidate(thd, table, FALSE);
}
感谢
本文主要参与者如下:余聪、王栋、余建锋、圣勇进、徐云磊、兰春