问题描述
我正在开发一款游戏(梦想),我的后端堆栈是带有Knex的Node.js和PostgreSQL(9.6)。我在这里保存所有玩家数据,我需要经常请求它。
其中一个请求需要进行10次简单选择以提取数据,这就是问题开始的地方:如果服务器同时只提供1个请求,这些查询速度非常快(~1ms)。但是,如果服务器服务器并行多次请求(100-400),查询执行时间会极度降级(每个查询最多可以达到几秒)
I'm developing a game(of dream) and my backend stack is Node.js and PostgreSQL(9.6) with Knex. I hold all players data here and I need to request it frequently. One of the requests need to make 10 simple selects that would pull the data and this is where the problem starts: these queries are quite fast(~1ms), if server serves only 1 request at the same time. But if server server many requests in parallel(100-400), queries execution time degradates extremely (can be up to several seconds per query)
为了更客观,我将描述服务器的请求目标,选择查询和我收到的结果。
In order to be more objective, I will describe server's request goal, select queries and the results that I received.
我在Digital Ocean 4cpu / 8gb Droplet和Postgres上运行节点代码在同一个conf(2个不同的水滴,相同的配置)
I'm running node code on Digital Ocean 4cpu/8gb droplet and Postgres on the same conf(2 different droplets, same configuration)
需要做一些游戏操作,为数据库中的2位玩家选择数据
It need to do some gameplay actions for which he selects data for 2 players from DB
由5个桌子代表的玩家数据:
Players' data represented by 5 tables:
CREATE TABLE public.player_profile(
id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
public_data integer NOT NULL,
private_data integer NOT NULL,
current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
created_at bigint NOT NULL DEFAULT '0'::bigint,
CONSTRAINT player_profile_pkey PRIMARY KEY (id),
CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
REFERENCES public.profile_private_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
REFERENCES public.profile_public_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_data(
id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
owner_player integer NOT NULL,
character_id integer NOT NULL,
experience_counter integer NOT NULL,
level_counter integer NOT NULL,
character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
created_at bigint NOT NULL DEFAULT '0'::bigint,
CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_cards(
id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
card_id integer NOT NULL,
owner_player integer NOT NULL,
card_level integer NOT NULL,
first_deck boolean NOT NULL,
consumables integer NOT NULL,
second_deck boolean NOT NULL DEFAULT false,
third_deck boolean NOT NULL DEFAULT false,
quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
CONSTRAINT player_cards_pkey PRIMARY KEY (id),
CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
REFERENCES public.player_profile (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_equipment(
id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
owner_character integer NOT NULL,
item_id integer NOT NULL,
item_level integer NOT NULL,
item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
is_equipped boolean NOT NULL,
slot_num integer,
CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
REFERENCES public.player_character_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
CREATE TABLE public.player_character_runes(
id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
owner_character integer NOT NULL,
item_id integer NOT NULL,
slot_num integer,
decay_start_timestamp bigint,
CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
REFERENCES public.player_character_data (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
);
索引
With indexes
knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');
代码
首次查询
The code
First query
async.parallel([
cb => tx('player_character_data')
.select('character_id', 'id')
.where('owner_player', playerId)
.limit(1)
.asCallback(cb),
cb => tx('player_character_data')
.select('character_id', 'id')
.where('owner_player', enemyId)
.limit(1)
.asCallback(cb)
], callbackFn);
第二次查询
Second query
async.parallel([
cb => tx('player_profile')
.select('current_active_deck_num')
.where('id', playerId)
.asCallback(cb),
cb => tx('player_profile')
.select('current_active_deck_num')
.where('id', enemyId)
.asCallback(cb)
], callbackFn);
第三q
Third q
playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
cb => tx('player_cards')
.select('card_id', 'card_level')
.where('owner_player', playerId)
.andWhere(playerQ)
.limit(MAX_CARDS_IN_DECK)
.asCallback(cb),
cb => tx('player_cards')
.select('card_id', 'card_level')
.where('owner_player', enemyId)
.andWhere(enemyQ)
.limit(MAX_CARDS_IN_DECK)
.asCallback(cb)
], callbackFn);
第四个q
Fourth q
MAX_EQUIPPED_ITEMS = 3
async.parallel([
cb => tx('player_character_equipment')
.select('item_id', 'item_level')
.where('owner_character', playerCharacterUniqueId)
.andWhere('is_equipped', true)
.limit(MAX_EQUIPPED_ITEMS)
.asCallback(cb),
cb => tx('player_character_equipment')
.select('item_id', 'item_level')
.where('owner_character', enemyCharacterUniqueId)
.andWhere('is_equipped', true)
.limit(MAX_EQUIPPED_ITEMS)
.asCallback(cb)
], callbackFn);
第五个
Fifth one
runeSlotsMax = 3
async.parallel([
cb => tx('player_character_runes')
.select('item_id', 'decay_start_timestamp')
.where('owner_character', playerCharacterUniqueId)
.whereNotNull('slot_num')
.limit(runeSlotsMax)
.asCallback(cb),
cb => tx('player_character_runes')
.select('item_id', 'decay_start_timestamp')
.where('owner_character', enemyCharacterUniqueId)
.whereNotNull('slot_num')
.limit(runeSlotsMax)
.asCallback(cb)
], callbackFn);
EXPLAIN(ANALYZE)
仅指数扫描和< 1ms用于计划和执行时间。如果需要可以发布(没有发布以节省空间)
EXPLAIN(ANALYZE)
Only Index scans and <1ms for planning and execution times. Can publish if need (did not published to save space)
( 总是请求数, min / max / avg / 中位数用于响应时间)
(total is number of requests, min/max/avg/median is for response time)
- 4个并发请求:
{total:300,avg:1.81, median:2,min:1,max:6}
- 400个并发请求:
-
{total:300,avg:209.57666666666665,median:176,min:9,max:1683}
- 首先选择 -
{total:300,avg:2105.9,median:2005,min:1563,max :4074}
- 最后选择
- 4 concurrent requests:
{ "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
- 400 concurrent requests:
{ "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 }
- first select{ "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 }
- last select
我试图将执行时间超过100毫秒的慢查询放入日志中 - 没有。还尝试将连接池大小增加到多个并行请求 - 也没有。
I tried to put slow queries that are performed more than 100ms into logs - nothing. Also tried to increase connection pool size up to num of parallel requests - nothing too.
推荐答案
找到了一个解决方案,但是忘了在这里回复(很忙,抱歉)。
A solution was found fast, but forgot to respond here (was busy, sorry).
慢查询没有神奇之处,但只有节点的事件循环性质:
No magic with slow queries, but only node's event loop nature:
- 所有silimar请求都是并行的;
- 我的代码块执行时间非常慢(~150-200ms);
- 如果你有~800个并行请求150ms代码块转换为~10000ms事件循环滞后;
- 所有你会看到的是慢速请求的可见性,但它是只是滞后的回调函数,而不是数据库;
- All silimar requests was made in parallel;
- I have a code block with very slow execution time(~150-200ms);
- If you have ~800 parallel requests 150ms code block transforms into ~10000ms event loop lag;
- All you'll see is a visibility of slow requests, but it's just a lag of callback function, not database;
结论:使用
pgBadger
检测慢查询和isBusy
模块检测事件循环滞后Conclusion: use
pgBadger
to detect slow queries andisBusy
module to detect event loop lags这篇关于Knex与PostgreSQL选择查询对多个并行请求的性能极度降低的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
-