(1) 原始代码
最近使用单生产者-多消费者模型是遇到一个问题,以前既然都没有想到过。生产者线程的代码如下,基本功能就是接收到一个连接之后创建一个Socket对象并放到list中等待处理。点击(此处)折叠或打开
- void DataManager::InternalStart() {
- server_socket_ = new ServerSocket();
- if (!server_socket_->SetAddress(NetworkUtil::GetIpAddress().c_str(), 9091)) {
- LOG(ERROR) << "Set address failed.";
- delete server_socket_;
- server_socket_ = NULL;
- return;
- }
- server_socket_->SetSoBlocking(true);
- if (!server_socket_->Listen()) {
- LOG(ERROR) << "listen failed.";
- return;
- }
- Socket *socket = NULL;
- while (!stop_) {
- if ((socket = server_socket_->Accept()) != NULL) {
- LOG(INFO) << "Recieved connection fd: " << socket->GetAddr();
- {
- common::MutexLock lc(&socket_mu_);
- socket_list_.push_back(socket);
- cond_var_.Signal();
- }
- }
- }
多个消费者线程的的代码如下,基本功能是从list中取得一个Socket对象进行处理;
点击(此处)折叠或打开
- void DataManager::WorkEntry() {
- Socket *socket = NULL;
- while (!stop_) {
- // Get connection socket.
- {
- common::MutexLock lc(&socket_mu_);
- if (socket_list_.empty()) {
- cond_var_.Wait(&socket_mu_);
- }
- if (stop_)
- break;
- socket = socket_list_.front();
- socket_list_.pop_front();
- }
-
- bool success = false;
- do{
- {
- Packet request;
- if ((success = socket->GetPacket(&request))) {
- HandlePacket(&request);
- }
- }
- } while (success);
-
- delete socket;
- socket = NULL;
- }
- }
(2) 问题
运行过程中进场出现段错误,都是在12行(socket = socket_list_.front())。使用GDB调试发现socket_list_的size为0。(3) 加入log调试
加入下面的log进行调试点击(此处)折叠或打开
- @@ -115,6 +115,7 @@ voidDataManager::InternalStart() {
- {
- common::MutexLocklc(&socket_mu_);
- socket_list_.push_back(socket);
- + LOG(INFO) << "1:size: " << socket_list_.size();
- cond_var_.Signal();
- }
- }
- @@ -129,11 +130,14 @@ voidDataManager::WorkEntry() {
- common::MutexLock lc(&socket_mu_);
- if (socket_list_.empty()) {
- cond_var_.Wait(&socket_mu_);
- + LOG(INFO) << "2:size: " << socket_list_.size();
- }
- if (stop_)
- break;
- + LOG(INFO) << "3: size: " << socket_list_.size();
- socket = socket_list_.front();
- socket_list_.pop_front();
- + LOG(INFO) << "4: size:" << socket_list_.size()
打印的log如下:
I0809 02:35:45.269896 17305DataManager.cc:114] Recieved connection fd: 10.237.92.30:37220
I0809 02:35:45.269902 17305DataManager.cc:118] 1: size: 1
I0809 02:35:45.269928 17310DataManager.cc:133] 2: size: 1
I0809 02:35:45.269935 17310DataManager.cc:137] 3: size: 1
I0809 02:35:45.269937 17310DataManager.cc:140] 4: size: 0
………
I0809 02:35:45.271636 17305 DataManager.cc:114]Recieved connection fd: 10.237.92.30:37224
I0809 02:35:45.271644 17305DataManager.cc:118] 1: size: 1
I0809 02:35:45.271663 17310DataManager.cc:137] 3: size: 1
I0809 02:35:45.271670 17310DataManager.cc:140] 4: size: 0
I0809 02:35:45.271739 17309 DataManager.cc:133]2: size: 0
I0809 02:35:45.271750 17309DataManager.cc:137] 3: size: 0
(4) 分析:
a) 正常的log顺序正常的log顺序应该是,add一个Socket之后得到,有一个消费者线程被signal唤醒并处理这个socket。I0809 02:35:45.269902 17305DataManager.cc:118] 1: size: 1
I0809 02:35:45.269928 17310DataManager.cc:133] 2: size: 1
I0809 02:35:45.269935 17310DataManager.cc:137] 3: size: 1
I0809 02:35:45.269937 17310DataManager.cc:140] 4: size: 0
I0809 02:35:45.271644 17305DataManager.cc:118] 1: size: 1
I0809 02:35:45.271663 17310DataManager.cc:137] 3: size: 1
I0809 02:35:45.271670 17310DataManager.cc:140] 4: size: 0
I0809 02:35:45.271739 17309 DataManager.cc:133]2: size: 0
I0809 02:35:45.271750 17309DataManager.cc:137] 3: size: 0
a) 初始状态; i. 17305:获得socket_mu_准备向socket_list_中插入socket。 ii. 17309:正处于cond_var_.Wait(&socket_mu_);状态下等待cond_var发生; iii. 17310 :socket_mu_应该是在试图 b) 17305线程调用cond_var_.Signal()唤醒17309,此时17309和17310还需要争夺socket_mu_,应该是17310先得到了socket_mu_所以17309必须再次睡眠。 c) 17310将刚才17305生产的socket消耗了,并且释放了socket_mu_。但是此时的socket_list_有变成空的了。 d) 17309得到socket_mu_,调用socket_list_.front()时程序crash。
(4) 解决办法:多加一个判断
点击(此处)折叠或打开
- @@ -129,6 +129,9 @@ voidDataManager::WorkEntry() {
- common::MutexLock lc(&socket_mu_);
- if (socket_list_.empty()) {
- cond_var_.Wait(&socket_mu_);
- +
- + if (socket_list_.empty())
- + continue;
- }
- if (stop_)
- break