问题描述
我有一个运行嵌入式 linux 的系统,它连续运行至关重要.基本上,它是一个与传感器通信并将数据中继到数据库和 Web 客户端的过程.
I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
如果发生崩溃,如何自动重启应用程序?
If a crash occurs, how do I restart the application automatically?
此外,还有几个线程在进行轮询(例如套接字和 uart 通信).如何确保没有线程挂起或意外退出?是否有易于使用且线程友好的看门狗?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?
推荐答案
它的要点是:
- 您需要检测程序是否仍在运行且未挂起.
- 如果程序未运行或挂起,您需要(重新)启动程序.
有许多不同的方法可以做#1,但我想到的有两种:
There are a number of different ways to do #1, but two that come to mind are:
侦听 UNIX 域套接字,以处理状态请求.然后外部应用程序可以询问该应用程序是否仍然正常.如果在某个超时时间内没有得到响应,则可以认为被查询的应用程序已经死锁或死了.
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
定期触摸带有预选路径的文件.外部应用程序可以查看文件的时间戳,如果它是陈旧的,那么它可以假定应用程序已死或死锁.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
关于#2,杀死之前的PID并使用fork+exec启动一个新进程是典型的.您还可以考虑将连续"运行的应用程序变成运行一次的应用程序,然后使用cron"或其他一些应用程序连续重新运行该单次运行的应用程序.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
不幸的是,看门狗定时器和摆脱死锁是非常重要的问题.我不知道有什么通用的方法可以做到这一点,而且我见过的少数方法非常丑陋,而且不是 100% 没有错误.但是,tsan 可以通过静态分析帮助检测潜在的死锁情况和其他线程问题.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.
这篇关于Linux在崩溃时自动重启应用程序 - 守护进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!