本文介绍了Apache Kudu插入慢,排队时间长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用Spark数据源从Parquet写入Kudu,写入性能非常糟糕:大约12000行/秒.每行大约160个字节.

I have been using Spark Data Source to write to Kudu from Parquet, and the write performance is terrible: about 12000 rows / seconds. Each row roughly 160 bytes.

我们有7个kudu节点,每个24核+ 64 GB RAM +每个12 SATA磁盘.资源似乎都不是瓶颈:tserver cpu的使用约为3-4核,RAM 10G,没有磁盘拥塞.

We have 7 kudu nodes, 24 core + 64 GB RAM each + 12 SATA disk each. None of the resources seem to be the bottleneck: tserver cpu usage ~3-4 core, RAM 10G, no disk congestion.

我仍然看到大多数时间写请求都停留在队列中.任何想法都表示赞赏.

Still I see most of the time write requests were stuck in queuing. Any ideas are appreciated.

W0811 12:34:03.526340  7753 rpcz_store.cc:251] Call kudu.tserver.TabletServerService.Write from 10.60.170.18:10000 (ReqId={client: 81ae6f3c6e1b4d9493ea95f87ccd1dfa, seq_no=9365, attempt_no=1}) took 13255ms (client timeout 10000).
W0811 12:34:03.526489  7753 rpcz_store.cc:255] Trace:
0811 12:33:50.270477 (+     0us) service_pool.cc:163] Inserting onto call queue
0811 12:33:50.270497 (+    20us) service_pool.cc:222] Handling call
0811 12:34:03.526316 (+13255819us) inbound_call.cc:157] Queueing success response
Related trace 'txn':
0811 12:34:03.328337 (+     0us) write_transaction.cc:101] PREPARE: Starting
0811 12:34:03.328563 (+   226us) write_transaction.cc:268] Acquiring schema lock in shared mode
0811 12:34:03.328564 (+     1us) write_transaction.cc:271] Acquired schema lock
0811 12:34:03.328564 (+     0us) tablet.cc:400] PREPARE: Decoding operations
0811 12:34:03.328742 (+   178us) tablet.cc:422] PREPARE: Acquiring locks for 24 operations
0811 12:34:03.447163 (+118421us) lock_manager.cc:377] Waited 118408us for lock on <redacted>
0811 12:34:03.447203 (+    40us) tablet.cc:426] PREPARE: locks acquired
0811 12:34:03.447203 (+     0us) write_transaction.cc:126] PREPARE: finished.
0811 12:34:03.447361 (+   158us) write_transaction.cc:136] Start()
0811 12:34:03.447366 (+     5us) write_transaction.cc:141] Timestamp: P: 1533965643563964 usec, L: 6
0811 12:34:03.447674 (+   308us) log.cc:582] Serialized 64909 byte log entry
0811 12:34:03.449561 (+  1887us) write_transaction.cc:149] APPLY: Starting
0811 12:34:03.526238 (+ 76677us) tablet_metrics.cc:365] ProbeStats: bloom_lookups=48,key_file_lookups=48,delta_file_lookups=24,mrs_lookups=0
0811 12:34:03.526260 (+    22us) log.cc:582] Serialized 237 byte log entry
0811 12:34:03.526268 (+     8us) write_transaction.cc:309] Releasing row and schema locks
0811 12:34:03.526280 (+    12us) write_transaction.cc:277] Released schema lock
0811 12:34:03.526300 (+    20us) write_transaction.cc:196] FINISH: updating metrics
Metrics: {"child_traces":[["txn",{"apply.queue_time_us":11,"cfile_cache_hit":205,"cfile_cache_hit_bytes":21900627,"num_ops":24,"prepare.queue_time_us":13057291,"prepare.run_cpu_time_us":1017,"prepare.run_wall_time_us":119378,"raft.queue_time_us":71,"raft.run_cpu_time_us":303,"raft.run_wall_time_us":304,"replication_time_us":2170,"row_lock_wait_count":1,"row_lock_wait_us":118408,"spinlock_wait_cycles":45824}]]}

推荐答案

事实证明,这是由于我们的数据重复造成的.我们使用的字段包含大约120万行,其值与Kudu中的主键相同(为空字符串).因此,Kudu将同一密钥更新了120万次,并且每次都需要获取一把锁,因此随着时间的推移,提取速度会下降.

Turn out it's due to the duplication in our data. We are using a field which contains about 1.2 million rows with the same value (which is and empty string) as the primary key in Kudu. So Kudu updated that same key 1.2 million times, and each time it needed to acquire a lock, hence the drop in ingesting speed over time.

我们删除了重复的键行,提取速度提高到10倍.

We have removed the duplicated key rows and the ingesting speed increase to 10x.

这篇关于Apache Kudu插入慢,排队时间长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-21 07:29