问题描述
我有一个webapp,我正在进行一些负载/性能测试,特别是我们希望有几百个用户访问同一页面并在此页面上每10秒钟刷新一次。我们发现使用此功能可以改进的一个方面是在一段时间内缓存来自Web服务的响应,因为数据没有变化。
I have a webapp that I am in the middle of doing some load/performance testing on, particularily on a feature where we expect a few hundred users to be accessing the same page and hitting refresh about every 10 seconds on this page. One area of improvement that we found we could make with this function was to cache the responses from the web service for some period of time, since the data is not changing.
在实现这个基本缓存之后,在一些进一步的测试中,我发现我没有考虑并发线程如何同时访问Cache。我发现在大约100毫秒内,大约50个线程试图从缓存中获取对象,发现它已经过期,命中Web服务以获取数据,然后将对象放回缓存中。
After implementing this basic caching, in some further testing I found out that I didn't consider how concurrent threads could access the Cache at the same time. I found that within the matter of ~100ms, about 50 threads were trying to fetch the object from the Cache, finding that it had expired, hitting the web service to fetch the data, and then putting the object back in the cache.
原始代码如下所示:
private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
final String key = "Data-" + email;
SomeData[] data = (SomeData[]) StaticCache.get(key);
if (data == null) {
data = service.getSomeDataForEmail(email);
StaticCache.set(key, data, CACHE_TIME);
}
else {
logger.debug("getSomeDataForEmail: using cached object");
}
return data;
}
因此,要确保只有一个线程在调用Web服务时对象 key
过期,我以为我需要同步Cache get / set操作,看起来使用缓存键似乎是一个很好的候选对象同步(这样,通过方法调用[email protected]就不会阻止调用电子邮件[email protected]的方法。)
So, to make sure that only one thread was calling the web service when the object at key
expired, I thought I needed to synchronize the Cache get/set operation, and it seemed like using the cache key would be a good candidate for an object to synchronize on (this way, calls to this method for email [email protected] would not be blocked by method calls to [email protected]).
我将方法更新为看起来像这样:
I updated the method to look like this:
private SomeData[] getSomeDataByEmail(WebServiceInterface service, String email) {
SomeData[] data = null;
final String key = "Data-" + email;
synchronized(key) {
data =(SomeData[]) StaticCache.get(key);
if (data == null) {
data = service.getSomeDataForEmail(email);
StaticCache.set(key, data, CACHE_TIME);
}
else {
logger.debug("getSomeDataForEmail: using cached object");
}
}
return data;
}
我还为同步块之前,内部等内容添加了日志行同步块,即将离开同步块,和同步块后,所以我可以确定我是否有效地同步了get / set操作。
I also added logging lines for things like "before synchronization block", "inside synchronization block", "about to leave synchronization block", and "after synchronization block", so I could determine if I was effectively synchronizing the get/set operation.
但是它似乎不起作用。我的测试日志输出如下:
However it doesn't seem like this has worked. My test logs have output like:
I想要在get / set操作周围一次只看到一个线程进入/退出同步块。
I wanted to see only one thread at a time entering/exiting the synchronization block around the get/set operations.
在String对象上同步是否存在问题?我认为缓存键是一个不错的选择,因为它对于操作是唯一的,即使在方法中声明 final String key
,我也在想线程将获得对相同对象的引用,因此将在此单个对象上进行同步。
Is there an issue in synchronizing on String objects? I thought the cache-key would be a good choice as it is unique to the operation, and even though the final String key
is declared within the method, I was thinking that each thread would be getting a reference to the same object and therefore would synchronization on this single object.
我在这里做错了什么?
更新:在进一步查看日志后,似乎具有相同同步逻辑的方法,其中密钥始终相同,例如
Update: after looking further at the logs, it seems like methods with the same synchronization logic where the key is always the same, such as
final String key = "blah";
...
synchronized(key) { ...
不要表现出相同的并发问题 - 一次只有一个线程进入该区块。
do not exhibit the same concurrency problem - only one thread at a time is entering the block.
更新2 :感谢大家的帮助!我接受了关于 intern()
ing Strings的第一个答案,它解决了我的初始问题 - 多个线程进入同步块,我认为它们不应该,因为 key
具有相同的值。
Update 2: Thanks to everyone for the help! I accepted the first answer about intern()
ing Strings, which solved my initial problem - where multiple threads were entering synchronized blocks where I thought they shouldn't, because the key
's had the same value.
正如其他人所指出的那样,使用 intern()
出于这样的目的并同步这些字符串确实是一个坏主意 - 当针对webapp运行JMeter测试以模拟预期的负载时,我看到使用的堆大小增长到接近1GB不到20分钟。
As others have pointed out, using intern()
for such a purpose and synchronizing on those Strings does indeed turn out to be a bad idea - when running JMeter tests against the webapp to simulate the expected load, I saw the used heap size grow to almost 1GB in just under 20 minutes.
目前我正在使用仅仅同步整个方法的简单解决方案 - 但我真的就像提供的代码示例一样通过martinprobst和MBCook,但由于我目前在这个类中有大约7个类似的 getData()
方法(因为它需要来自Web服务的大约7个不同的数据),不希望添加几乎重复的逻辑来获取和释放每个遇到的锁HOD。但这对于未来的使用来说绝对是非常非常有价值的信息。我认为这些最终是关于如何最好地进行这样的线程安全操作的正确答案,如果可以的话,我会给这些答案更多的投票!
Currently I'm using the simple solution of just synchronizing the entire method - but I really like the code samples provided by martinprobst and MBCook, but since I have about 7 similar getData()
methods in this class currently (since it needs about 7 different pieces of data from a web service), I didn't want to add almost-duplicate logic about getting and releasing locks to each method. But this is definitely very, very valuable info for future usage. I think these are ultimately the correct answers on how best to make an operation like this thread-safe, and I'd give out more votes to these answers if I could!
推荐答案
没有完全放入我的大脑,快速扫描你所说的看起来好像你需要实习()你的字符串:
Without putting my brain fully into gear, from a quick scan of what you say it looks as though you need to intern() your Strings:
final String firstkey = "Data-" + email;
final String key = firstkey.intern();
具有相同值的两个字符串不一定是同一个对象。
Two Strings with the same value are otherwise not necessarily the same object.
请注意,这可能会引入新的争用点,因为在VM的深处,intern()可能必须获取锁。我不知道现代虚拟机在这个领域是什么样的,但是人们希望它们能够进行极端优化。
Note that this may introduce a new point of contention, since deep in the VM, intern() may have to acquire a lock. I have no idea what modern VMs look like in this area, but one hopes they are fiendishly optimised.
我假设你知道StaticCache仍然需要是线程安全的。但是,如果你在调用getSomeDataForEmail时锁定缓存而不仅仅是密钥,那么那里的争论应该是微不足道的。
I assume you know that StaticCache still needs to be thread-safe. But the contention there should be tiny compared with what you'd have if you were locking on the cache rather than just the key while calling getSomeDataForEmail.
响应问题更新:
我认为这是因为字符串文字总是产生相同的对象。戴夫·科斯塔在评论中指出,它甚至比这更好:文字总是产生规范表示。因此,程序中任何位置具有相同值的所有String文字都会产生相同的对象。
I think that's because a string literal always yields the same object. Dave Costa points out in a comment that it's even better than that: a literal always yields the canonical representation. So all String literals with the same value anywhere in the program would yield the same object.
编辑
其他人已经指出同步实习生字符串实际上是一个非常糟糕的主意 - 部分原因是因为创建实习生字符串是允许它们永久存在的,部分是因为如果超过程序中任何地方的一位代码在实习字符串上同步,你在这些代码之间有依赖关系,防止死锁或其他错误可能是不可能的。
Others have pointed out that synchronizing on intern strings is actually a really bad idea - partly because creating intern strings is permitted to cause them to exist in perpetuity, and partly because if more than one bit of code anywhere in your program synchronizes on intern strings, you have dependencies between those bits of code, and preventing deadlocks or other bugs may be impossible.
要避免的策略我在键入的其他答案中正在开发每个键字符串存储一个锁定对象。
Strategies to avoid this by storing a lock object per key string are being developed in other answers as I type.
这是一个替代方案 - 它仍然使用单一锁,但我们知道我们'无论如何,我还需要其中一个用于缓存,你说的是50个线程,而不是5000个,所以这可能不是致命的。我还假设这里的性能瓶颈是在DoSlowThing()中阻塞I / O很慢,因此不会被序列化带来巨大好处。如果这不是瓶颈,那么:
Here's an alternative - it still uses a singular lock, but we know we're going to need one of those for the cache anyway, and you were talking about 50 threads, not 5000, so that may not be fatal. I'm also assuming that the performance bottleneck here is slow blocking I/O in DoSlowThing() which will therefore hugely benefit from not being serialised. If that's not the bottleneck, then:
- 如果CPU忙,那么这种方法可能还不够,你需要另一种方法。
- 如果CPU不忙,并且对服务器的访问不是瓶颈,那么这种方法就是矫枉过正,你不妨忘记这个和每个键的锁定,放一个大的同步( StaticCache)围绕整个操作,并以简单的方式完成。
显然这种方法需要在使用前进行可靠性测试 - - 我什么都不保证。
Obviously this approach needs to be soak tested for scalability before use -- I guarantee nothing.
此代码不要求StaticCache同步或以其他方式线程安全。如果任何其他代码(例如预定的旧数据清理)触及缓存,则需要重新访问。
This code does NOT require that StaticCache is synchronized or otherwise thread-safe. That needs to be revisited if any other code (for example scheduled clean-up of old data) ever touches the cache.
IN_PROGRESS是一个虚拟值 - 不完全干净,但代码很简单,它节省了两个哈希表。它不处理InterruptedException,因为在这种情况下我不知道你的应用程序想要做什么。此外,如果DoSlowThing()对于给定键始终失败,则此代码不是很完美,因为每个线程都会重试它。由于我不知道失败的标准是什么,以及它们是否可能是临时的或永久性的,我也不会处理这个问题,我只是确保线程不会永远阻塞。在实践中,您可能希望在缓存中放置一个数据值,表示不可用,可能有原因,以及何时重试超时。
IN_PROGRESS is a dummy value - not exactly clean, but the code's simple and it saves having two hashtables. It doesn't handle InterruptedException because I don't know what your app wants to do in that case. Also, if DoSlowThing() consistently fails for a given key this code as it stands is not exactly elegant, since every thread through will retry it. Since I don't know what the failure criteria are, and whether they are liable to be temporary or permanent, I don't handle this either, I just make sure threads don't block forever. In practice you may want to put a data value in the cache which indicates 'not available', perhaps with a reason, and a timeout for when to retry.
// do not attempt double-check locking here. I mean it.
synchronized(StaticObject) {
data = StaticCache.get(key);
while (data == IN_PROGRESS) {
// another thread is getting the data
StaticObject.wait();
data = StaticCache.get(key);
}
if (data == null) {
// we must get the data
StaticCache.put(key, IN_PROGRESS, TIME_MAX_VALUE);
}
}
if (data == null) {
// we must get the data
try {
data = server.DoSlowThing(key);
} finally {
synchronized(StaticObject) {
// WARNING: failure here is fatal, and must be allowed to terminate
// the app or else waiters will be left forever. Choose a suitable
// collection type in which replacing the value for a key is guaranteed.
StaticCache.put(key, data, CURRENT_TIME);
StaticObject.notifyAll();
}
}
}
每次添加任何内容在缓存中,所有线程都会唤醒并检查缓存(无论它们处于什么密钥之后),因此可以通过较少争用的算法获得更好的性能。但是,大部分工作将在I / O上大量空闲CPU时间阻塞期间进行,因此可能不会出现问题。
Every time anything is added to the cache, all threads wake up and check the cache (no matter what key they're after), so it's possible to get better performance with less contentious algorithms. However, much of that work will take place during your copious idle CPU time blocking on I/O, so it may not be a problem.
此代码可能很常见 - 如果您为缓存及其关联的锁定定义合适的抽象,它返回的数据,IN_PROGRESS虚拟以及要执行的慢速操作,则可以使用多个缓存。将整个事物整理到缓存中的方法可能不是一个坏主意。
This code could be commoned-up for use with multiple caches, if you define suitable abstractions for the cache and its associated lock, the data it returns, the IN_PROGRESS dummy, and the slow operation to perform. Rolling the whole thing into a method on the cache might not be a bad idea.
这篇关于在Java中同步String对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!