问题描述
我正在尝试测量位于同一 Postgres 服务器上的各种数据库所产生的负载,以确定如何最好地将它们拆分到多个服务器上.我设计了这个查询:
I am trying to measure the load that various databases living on the same Postgres server are incurring, to determine how to best split them up across multiple servers. I devised this query:
select
now() as now,
datname as database,
usename as user,
count(*) as processes
from pg_stat_activity
where state = 'active'
and waiting = 'f'
and query not like '%from pg_stat_activity%'
group by
datname,
usename;
但令人惊讶的是,活动进程很少!
But there were surprisingly few active processes!
深入挖掘我运行了一个简单的查询,它返回 20k 行并用了 5 秒完成,根据我运行它的客户端.那段时间我查询pg_stat_activity
时,进程idle!我重复了几次这个实验.
Digging deeper I ran a simple query that returns 20k rows and took 5 seconds to complete, according to the client I ran it from. When I queried pg_stat_activity
during that time, the process was idle! I repeated this experiment several times.
Postgres 文档说active 意味着
The Postgres documentation says active means
后端正在执行查询.
和空闲意味着
后端正在等待新的客户端命令.
它真的比这更微妙吗?为什么在我签入时运行我的查询的进程没有活动?
Is it really more nuanced than that? Why was the process running my query was not active when I checked in?
如果这种方法有缺陷,除了定期对活动进程的数量进行采样之外,还有什么替代方法可以以数据库粒度测量负载?
If this approach is flawed, what alternatives are there for measuring load at a database granularity than periodically sampling the number of active processes?
推荐答案
您对 active
、idle
和 idle in transaction
的期望非常高正确的.我能想到的唯一解释是显示数据客户端的巨大延迟.所以查询确实在服务器上完成,会话是 idle
,但你没有看到客户端的结果.
your expectations regarding active
, idle
and idle in transaction
are very right. The only explanation I can think of is a huge delay in displaying data client side. So the query indeed finished on server and session is idle
and yet you don't see the result with client.
关于负载测量 - 我不会太依赖活动会话的数量.在活动状态下命中快速查询纯属运气.例如,假设您可以每秒检查 pg_stat_activity
并查看一个活动会话,但是在测量之间,一个 db 被查询了 10 次,另一个是一次 - 但这些数字都不会被看到.因为他们在两次处决之间很活跃.而这个 10+1 的活动状态(虽然意味着一个 db 被查询的频率提高了 10 倍)并不意味着您根本就应该考虑负载 - 因为集群没有加载太多,您甚至无法捕获执行.但这不可避免地意味着您可以捕获许多活动会话,并不意味着服务器确实已加载.
regarding the load measurement - I would not rely on number of active sessions much. Pure luck to hit the fast query in active state. Eg hypothetically you can check pg_stat_activity
each second and see one active session, but between measurement one db was queried 10 times and another once - yet none of those numbers will be seen. Because they were active between executions. And this 10+1 active states (although mean that one db is queried 10times more often) do not mean you should consider load at all - because cluster is so much not loaded, that you can't even catch executions. But this unavoidably mean that you can catch many active sessions and it would not mean that server is loaded indeed.
所以至少将 now()-query_start
带入您的查询以捕获更长的查询.或者甚至可以更好地节省一些经常查询的执行时间并衡量它是否会随着时间的推移而降低.或者更好地选择 pid
并检查该 pid 占用的资源.
so at least take now()-query_start
to your query to catch longer queries. Or even better save execution time for some often queries and measure if it degrades over time. Or better select pid
and check resources eaten by that pid.
顺便说一句,对于较长的查询,请查看 pg_stat_statements - 查看它们如何随时间变化可以让您对负载如何变化有一些期望
Btw for longer queries look into pg_stat_statements - looking how they change over time can give you some expectations on how the load changes
这篇关于使用 pg_stat_activity 中的“活动"进程测量 Postgres 中每个数据库的负载?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!