问题描述
在生产中,由于某些原因,我们的 delayed_job
流程快要死了。我不确定它是否崩溃或被操作系统杀死或被什么杀死。我在 delayed_job.log
文件中没有看到任何错误。
In production, our delayed_job
process is dying for some reason. I'm not sure if it's crashing or being killed by the operating system or what. I don't see any errors in the delayed_job.log
file.
该怎么办才能解决此问题?我当时正在考虑安装进行监视,但这只会告诉我它何时死亡。它不会真正告诉我它为什么死了。
What can I do to troubleshoot this? I was thinking of installing monit to monitor it, but that will only tell me precisely when it dies. It won't really tell me why it died.
是否有一种方法可以使日志文件更加健谈,所以我可以说出它为什么快死了?
Is there a way to make it more chatty to the log file, so I can tell why it might be dying?
还有其他建议吗?
推荐答案
我遇到了两个问题delay_job静默失败的原因。第一个是人们在派生的进程中使用libxml时的实际段错误(这在一段时间前出现在邮件列表中)。
I've come across two causes of delayed_job failing silently. The first is actual segfaults when people were using libxml in forked processes (this popped up on the mailing list some time back).
第二个是与延迟工作依赖的1.1.0版守护程序存在问题(),可以使用1.0.10轻松解决此问题,这是我自己的Gemfile所具有的功能。
The second is an issue to do with the 1.1.0 version of daemons that delayed_job relies on has a problem (https://github.com/collectiveidea/delayed_job/issues#issue/81), this can be easily worked around by using 1.0.10 which is what my own Gemfile has in it.
有日志记录在延迟作业中,因此,如果工人在不打印错误的情况下死亡,通常是因为它没有引发异常(例如,Segfault)或外部原因导致了死亡
There is logging in delayed_job so if the worker is dying without printing an error it's usually because it's not throwing an exception (e.g. Segfault) or something external is killing the process.
我使用bluepill监视延迟的工作实例,到目前为止,在确保作业继续运行方面非常成功。使应用程序运行bluepill的步骤非常简单
I use bluepill to monitor my delayed job instances, and so far this has been very successful at ensuring that the jobs remain running. The steps to get bluepill running for an application are quite easy
将bluepill gem添加到您的Gemfile中:
Add the bluepill gem to your Gemfile:
# Monitoring
gem 'i18n' # Not sure why but it complained I didn't have it
gem 'bluepill'
我创建了一个bluepill配置文件:
I created a bluepill config file:
app_home = "/home/mi/production"
workers = 5
Bluepill.application("mi_delayed_job", :log_file => "#{app_home}/shared/log/bluepill.log") do |app|
(0...workers).each do |i|
app.process("delayed_job.#{i}") do |process|
process.working_dir = "#{app_home}/current"
process.start_grace_time = 10.seconds
process.stop_grace_time = 10.seconds
process.restart_grace_time = 10.seconds
process.start_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job start -i #{i}"
process.stop_command = "cd #{app_home}/current && RAILS_ENV=production ruby script/delayed_job stop -i #{i}"
process.pid_file = "#{app_home}/shared/pids/delayed_job.#{i}.pid"
process.uid = "mi"
process.gid = "mi"
end
end
end
然后在我刚添加的capistrano部署文件中:
Then in my capistrano deploy file I just added:
# Bluepill related tasks
after "deploy:update", "bluepill:quit", "bluepill:start"
namespace :bluepill do
desc "Stop processes that bluepill is monitoring and quit bluepill"
task :quit, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged stop"
run "cd #{current_path} && bundle exec bluepill --no-privileged quit"
end
desc "Load bluepill configuration and start it"
task :start, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged load /home/mi/production/current/config/delayed_job.bluepill"
end
desc "Prints bluepills monitored processes statuses"
task :status, :roles => [:app] do
run "cd #{current_path} && bundle exec bluepill --no-privileged status"
end
end
希望这会有所帮助。
这篇关于经过一段时间的生产后,delay_job停止运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!