问题描述
Scylla 读取路径和 Cassandra 读取路径有什么区别?当我强调 Cassandra 和 Scylla 时,Scylla 读取性能比使用 16 核和普通硬盘的 Cassandra 差 5 倍.
What is the difference between Scylla read path and Cassandra read path? When I stress Cassandra and Scylla then Scylla read performance poor by 5 times than Cassandra using 16 core and normal HDD.
与使用普通 HDD 的 Cassandra 相比,我希望在 Scylla 上获得更好的读取性能,因为我的公司不提供 SSD.
I expect better read performance on Scylla compared to Cassandra using normal HDD, because my company doesn't provide SSD's.
有人可以确认一下,使用普通硬盘是否可以达到更好的读取性能?
Can someone please confirm, is it possible to achieve better read performance using normal HDD or not?
如果是,则需要对 scylla 配置进行哪些更改?.请指导我!
If yes, what changes required scylla config?. Please guide me!
推荐答案
其他一些关注写性能的回答,但这不是你问的 - 你问的是读.
Some other responses focused on write performance, but this isn't what you asked about - you asked about reads.
在 Cassandra 和 Scylla 中,HDD 上的未缓存读取性能肯定很差,因为从磁盘读取每个都需要在 HDD 上多次寻道,即使是最好的 HDD 也不能做更多的事情,比如说,其中每秒搜索 200 次.即使使用多个这些磁盘的 RAID,您也很少能够处理超过每秒 1000 个请求.由于现代多核可以比每秒 1000 个请求执行更多数量级的 CPU 工作,因此在 Scylla 和 Cassandra 情况下,您可能会看到空闲 CPU.因此,Scylla 的主要优势,每个请求使用更少的 CPU,甚至在磁盘成为性能瓶颈时都无关紧要.在这种情况下,我希望 Scylla 和 Cassandra 的性能(我假设您在谈论性能时正在测量吞吐量?)应该大致相同.
Uncached read performance on HDDs is bound to be poor in both Cassandra and Scylla, because reads from disk each requires several seeks on the HDD, and even the best HDD cannot do more than, say, 200 of those seeks per second. Even with a RAID of several of these disks, you will rarely be able to do more than, say, 1000 requests per second. Since a modern multi-core can do orders of magnitude more CPU work than 1000 requests per second, in both Scylla and Cassandra cases, you'll likely see free CPU. So Scylla's main benefit, of using much less CPU per request, will not even matter when the disk is the performance bottleneck. In such cases I would expect Scylla's and Cassandra's performance (I am assuming that you're measuring throughput when you talk about performance?) should be roughly the same.
如果您仍然看到 Cassandra 的吞吐量比 Scylla 更好,那么除了其他响应中提出的一般客户端错误配置问题之外,还有几个细节可以解释原因:
If, still, you're seeing better throughput from Cassandra than Scylla, there are several details that may explain why, beyond the general client mis-configuration issues raised in other responses:
如果您有少量的数据,可以放入内存,Cassandra 的缓存策略更适合您的工作负载.Cassandra 使用操作系统的页面缓存,它读取整个磁盘页面,并且可以在一次读取中缓存多个项目,以及多个索引条目.虽然 Scylla 的工作方式不同,但有一个行缓存 - 只缓存读取的特定数据.Scylla 的缓存对于无法放入内存的大量数据更好,但当数据可以放入内存时就更糟糕了,直到整个数据集都被缓存(在所有数据都被缓存后,它再次变得非常高效).
If you have low amounts of data, that can fit in memory, Cassandra's caching policy is better for your workload. Cassandra uses the OS's page cache, which reads whole disk pages and may cache multiple items in one read, as well as multiple index entries. While Scylla works differently, and has a row cache - only caching the specific data read. Scylla's caching is better for large volumes of data that do not fit in memory, but much worse when the data can fit in memory, until the entire data set has been cached (after everything is cached, it becomes very efficient again).
在 HDD 上,压缩的细节对于读取性能非常重要 - 如果在一种设置中您有更多的 sstable 可供读取,它会增加读取次数并降低性能.这可以根据您的压缩配置,甚至随机(取决于上次运行压缩的时间)而改变.您可以通过在两个系统上进行主要压缩(nodetool compact")并随后检查读取性能来检查这是否解释了您的性能问题.您可以将压缩策略切换到 LCS,以确保随机访问读取性能更好,但代价是写入工作更多(在 HDD 上,这可能是一个值得的妥协).
On HDDs, the details of compaction are very important for read performance - if in one setup you have more sstables to read, it can increase the number of reads and lower the performance. This can change depending on your compaction configuration, or even randomly (depending on when compaction was run last). You can check if this explains your performance issues by doing a major compaction ("nodetool compact") on both systems and checking the read performance afterwards. You can switch the compaction strategy to LCS to ensure that random-access read performance is better, at the cost of more write work (on HDDs, this can be a worthwhile compromise).
如果您正在测量扫描性能(读取整个表)而不是读取单个行,则其他问题变得相关:您可能听说过,Scylla 将每个节点细分为多个分片(每个分片是一个 CPU).这对于受 CPU 限制的工作来说非常棒,但对于扫描不是很大的表可能会更糟,因为每个 sstable 现在更小,并且您在需要再次查找之前可以读取的连续数据量更少.
If you are measuring scan performance (reading an entire table) instead of reading individual rows, other issues become relevant: As you may have heard, Scylla subdivides each nodes into shards (each shard is a single CPU). This is fantastic for CPU-bounded work, but could be worse for scanning tables which aren't huge, because each sstable is now smaller and the amount of contiguous data you can read before needing to seek again is lower.
我不知道这些差异中的哪一个 - 或其他什么 - 导致您的用例在 Scylla 中的性能较低,但我请记住,无论您修复什么,您的性能总是会很差与硬盘驱动器.使用 SDD,我们过去在单个节点上测得每秒超过一百万次随机访问读取请求.硬盘驱动器无法接近.如果您真的需要最佳性能或每美元性能,SDD 确实是您的最佳选择.
I don't know which of these differences - or something else - is causing performance of your use-case to be lower in Scylla, but I please keep in mind that whatever you fix, your performance is always going to be bad with HDDs. With SDDs, we've measured in the past more than a million random-access read requests per second on a single node. HDDs cannot come anything close. If you really need optimum performance or performance per dollar, SDDs are really the way to go.
这篇关于scylla 读取路径和 cassandra 读取路径有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!