问题描述
我有一个在Linux上运行的多线程程序,有时如果我对它运行gstack,则有一个线程在等待锁定很长时间(例如2-3分钟),
I have a multi-threads program which is running on Linux, sometimes if I run gstack against it, there is a thread was waiting for a lock for a long time(say, 2-3 minutes),
__ kernel_vsyscall()中的0 0x40000410
0 0x40000410 in __kernel_vsyscall ()
/lib/i686/nosegneg/libpthread.so.0中的__lll_lock_wait()中的1 0x400157b9
1 0x400157b9 in __lll_lock_wait () from /lib/i686/nosegneg/libpthread.so.0
2 0x40010e1d
2 0x40010e1d in _L_lock_981 () from /lib/i686/nosegneg/libpthread.so.0
3 0x40010d3b
3 0x40010d3b in pthread_mutex_lock () from /lib/i686/nosegneg/libpthread.so.0
...
我检查了其余的线程,它们都没有获得此锁,但是,不久之后,该线程(LWP 19853)可以成功获取此锁.
I checked the rest of the threads, none of them were taking this lock, however, after a while this thread (LWP 19853) could acquire this lock successfully.
应该有一个已经获得了此锁的线程,但是我找不到它,我缺少什么了吗?
There should exist one thread that had already acquired this lock, but I failed to find it, is there anything I missing?
pthread_mutex_t的定义:
The definition of the pthread_mutex_t:
{
结构__pthread_mutex_s{
struct __pthread_mutex_s {
int __lock;
int __lock;
unsigned int __count;
unsigned int __count;
int __owner;
int __owner;
/* KIND必须停留在结构中的此位置以保持二进制兼容性.*/
/* KIND must stay at this position in the structure to maintain binary compatibility. */
int __kind;
int __kind;
unsigned int __nusers;
unsigned int __nusers;
扩展联合{int __spins;__pthread_slist_t __list;};
extension union { int __spins; __pthread_slist_t __list; };
} __data;
字符_ 大小[ _SIZEOF_PTHREAD_MUTEX_T];
char _size[_SIZEOF_PTHREAD_MUTEX_T];
long int __align;
long int __align;
} pthread_mutex_t;
} pthread_mutex_t;
有一个成员"__owner",它是当前持有互斥量的线程的ID.
There is a member "__owner", it is the id of the thread who is holding the mutex now.
推荐答案
Mutexes默认情况下不会跟踪锁定它们的线程.(或者至少我不知道这样的事情)
Mutexes by default don't track the thread that locked them. (Or at least I don't know of such a thing )
有两种方法可以调试此类问题.一种方法是记录每一次锁定和解锁.在每次创建线程时,您都会记录已创建的线程ID的值.在锁定任何锁之后,立即记录线程ID和已锁定的锁的名称(您可以为此使用文件/行,或为每个锁分配一个名称).然后您就可以立即登录,然后再解锁任何锁.
There are two ways to debug this kind of problem. One way is to log every lock and unlock. On every thread creation you log the value of the thread id that got created. Right after locking any lock, you log the thread id, and the name of the lock that was locked ( you can use file/line for this, or assign a name to each lock). And you log again right before unlocking any lock.
如果您的程序没有数十个线程或更多线程,这是一种很好的方法.之后,日志开始变得难以管理.
This is a fine way to do it if your program doesn't have tens of threads or more. After that the logs start to become unmanageable.
另一种方法是将您的锁包装在一个类中,该类在每个锁之后立即将线程ID存储在锁对象中.您甚至可以创建一个全局锁注册表来跟踪这一点,您可以在需要时将其打印出来.
The other way is to wrap your lock in a class that stores the thread id in a lock object right after each lock. You might even create a global lock registry that tracks this, that you can print out when you need to.
类似的东西:
class MyMutex
{
public:
void lock() { mMutex.lock(); mLockingThread = getThreadId(); }
void unlock() { mLockingThread = 0; mMutex.unlock(); }
SystemMutex mMutex;
ThreadId mLockingThread;
};
这里的关键是-请勿为您的发行版实现这两种方法.全局锁定日志或锁定状态的全局注册表都会创建一个全局资源,该资源本身将成为处于锁定争用状态的资源.
The key here is - don't implement either of these methods for your release version. Both a global locking log, or a global registry of lock states creates a single global resource that will itself become a resource under lock contention.
这篇关于Linux:如何找到持有特定锁的线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!