NodeJS, Socket.io
想象一下,有2个用户 U1 & U2 ,通过Socket.io连接到应用.算法如下:
Imagine there are 2 users U1 & U2, connected to an app via Socket.io. The algorithm is the following:
- U1 完全失去Internet连接(例如,关闭Internet)
- U2 向 U1 发送一条消息.
- U1 尚未收到该消息,因为Internet断开
- 服务器通过心跳超时检测到 U1 断开连接
- U1 重新连接到socket.io
- U1 从未收到来自 U2 的消息-我猜它在第4步中丢失了.
- U1 completely loses Internet connection (ex. switches Internet off)
- U2 sends a message to U1.
- U1 does not receive the message yet, because the Internet is down
- Server detects U1 disconnection by heartbeat timeout
- U1 reconnects to socket.io
- U1 never receives the message from U2 - it is lost on Step 4 I guess.
Possible explanation
I think I understand why it happens:
- 在步骤4 服务器上,杀死套接字实例以及发送到 U1 的消息队列
- 此外,在第5步 U1 和服务器中创建新的连接(不会重用),因此即使消息仍在排队中,上一个连接还是会丢失. /li>
- on Step 4 Server kills socket instance and the queue of messages to U1 as well
- Moreover on Step 5 U1 and Server create new connection (it is not reused), so even if message is still queued, the previous connection is lost anyway.
How can I prevent this kind of data loss? I have to use hearbeats, because I do not people hang in app forever. Also I must still give a possibility to reconnect, because when I deploy a new version of app I want zero downtime.
P.S. The thing I call "message" is not just a text message I can store in database, but valuable system message, which delivery must be guaranteed, or UI screws up.
I do already have a user account system. Moreover, my application is already complex. Adding offline/online statuses won't help, because I already have this kind of stuff. The problem is different.
Check out step 2. On this step we technically cannot say if U1 goes offline, he just loses connection lets say for 2 seconds, probably because of bad internet. So U2 sends him a message, but U1 doesn't receive it because internet is still down for him (step 3). Step 4 is needed to detect offline users, lets say, the timeout is 60 seconds. Eventually in another 10 seconds internet connection for U1 is up and he reconnects to socket.io. But the message from U2 is lost in space because on server U1 was disconnected by timeout.
That is the problem, I wan't 100% delivery.
- 在{}用户中收集一个发射(发射名称和数据),由随机的emitID标识.发送发射
- 在客户端确认发射(通过发射ID将发射发送回服务器)
- 如果已确认,请从{}中删除由emitID标识的对象
- 如果用户重新连接-请为此用户检查{},并在其中循环执行{}中每个对象的步骤1
- 在断开连接或/和/或连接时刷新{},如有必要,请为用户
// Server
const pendingEmits = {};
socket.on('reconnection', () => resendAllPendingLimits);
socket.on('confirm', (emitID) => { delete(pendingEmits[emitID]); });
// Client
socket.on('something', () => {
socket.emit('confirm', emitID);
虽然这并不是Websocket的真正解决方案,但仍然有人可以使用.我们从Websockets迁移到SSE + Ajax. SSE允许您从客户端进行连接以保持持久的TCP连接并实时接收来自服务器的消息.要将消息从客户端发送到服务器-只需使用Ajax.存在诸如延迟和开销之类的缺点,但是SSE保证了可靠性,因为它是TCP连接.
While this is not really a solution for Websockets, someone may still find it handy. We migrated from Websockets to SSE + Ajax. SSE allows you to connect from a client to keep a persistent TCP connection and receive messages from a server in realtime. To send messages from a client to a server - simply use Ajax. There are disadvantages like latency and overhead, but SSE guarantees reliability because it is a TCP connection.
因为我们使用Express,所以我们将此库用于SSE https://github.com/dpskvn/express -sse ,但您可以选择适合自己的一个.
Since we use Express we use this library for SSE https://github.com/dpskvn/express-sse, but you can choose the one that fits you.
SSE,因此您需要使用polyfill: https://github .com/Yaffle/EventSource .
SSE is not supported in IE and most Edge versions, so you would need a polyfill: https://github.com/Yaffle/EventSource.
Others have hinted at this in other answers and comments, but the root problem is that Socket.IO is just a delivery mechanism, and you cannot depend on it alone for reliable delivery. The only person who knows for sure that a message has been successfully delivered to the client is the client itself. For this kind of system, I would recommend making the following assertions:
- 消息不会直接发送给客户端;而是将它们发送到服务器并存储在某种数据存储中.
- 客户端负责在重新连接时询问我错过了什么",并将查询数据存储区中存储的消息以更新其状态.
- 如果在收件人客户端已连接时将消息发送到服务器 ,则该消息将实时发送到客户端.
- Messages aren't sent directly to clients; instead, they get sent to the server and stored in some kind of data store.
- Clients are responsible for asking "what did I miss" when they reconnect, and will query the stored messages in the data store to update their state.
- If a message is sent to the server while the recipient client is connected, that message will be sent in real time to the client.
Of course, depending on your application's needs, you can tune pieces of this--for example, you can use, say, a Redis list or sorted set for the messages, and clear them out if you know for a fact a client is up to date.
- U1和U2都已连接到系统.
- U2向服务器发送一条消息,U1应该接收该消息.
- 服务器将消息存储在某种持久性存储中,并使用某种时间戳或顺序ID将其标记为U1.
- 服务器通过Socket.IO将消息发送到U1.
- U1的客户端确认(也许通过Socket.IO回调)它已收到消息.
- 服务器从数据存储中删除保留的消息.
- U1断开了Internet连接.
- U2向服务器发送一条消息,U1应该接收该消息.
- 服务器将消息存储在某种持久性存储中,并使用某种时间戳或顺序ID将其标记为U1.
- 服务器通过Socket.IO将消息发送到U1.
- U1的客户端不确认,因为他们处于脱机状态.
- 也许U2向U1发送了一些消息;它们都以相同的方式存储在数据存储中.
- U1重新连接时,它询问服务器我看到的最后一条消息是X/我的状态为X,我错过了什么."
- 服务器根据U1的请求向U1发送从数据存储中丢失的所有消息
- U1的客户端确认收到,服务器从数据存储中删除这些消息.
- U1 looses internet connectivity.
- U2 sends a message to the server that U1 should receive.
- The server stores the message in some kind of persistent store, marking it for U1 with some kind of timestamp or sequential ID.
- The server sends the message to U1 via Socket.IO.
- U1's client does not confirm receipt, because they are offline.
- Perhaps U2 sends U1 a few more messages; they all get stored in the data store in the same fashion.
- When U1 reconnects, it asks the server "The last message I saw was X / I have state X, what did I miss."
- The server sends U1 all the messages it missed from the data store based on U1's request
- U1's client confirms receipt and the server removes those messages from the data store.
如果您绝对希望有保证的交付,那么设计系统就很重要,即连接实际上并不重要,并且实时交付只是一个奖励;这几乎总是涉及某种数据存储.正如user568109在评论中提到的那样,有些消息传递系统可以抽象化所述消息的存储和传递,因此值得研究这种预构建的解决方案. (您可能仍然需要自己编写Socket.IO集成.)
If you absolutely want guaranteed delivery, then it's important to design your system in such a way that being connected doesn't actually matter, and that realtime delivery is simply a bonus; this almost always involves a data store of some kind. As user568109 mentioned in a comment, there are messaging systems that abstract away the storage and delivery of said messages, and it may be worth looking into such a prebuilt solution. (You will likely still have to write the Socket.IO integration yourself.)
If you're not interested in storing the messages in the database, you may be able to get away with storing them in a local array; the server tries to send U1 the message, and stores it in a list of "pending messages" until U1's client confirms that it received it. If the client is offline, then when it comes back it can tell the server "Hey I was disconnected, please send me anything I missed" and the server can iterate through those messages.
Luckily, Socket.IO provides a mechanism that allows a client to "respond" to a message that looks like native JS callbacks. Here is some pseudocode:
// server
pendingMessagesForSocket = [];
function sendMessage(message) {
socket.emit('message', message, function() {
socket.on('reconnection', function(lastKnownMessage) {
// you may want to make sure you resend them in order, or one at a time, etc.
for (message in pendingMessagesForSocket since lastKnownMessage) {
socket.emit('message', message, function() {
// client
socket.on('connection', function() {
if (previouslyConnected) {
socket.emit('reconnection', lastKnownMessage);
} else {
// first connection; any further connections means we disconnected
previouslyConnected = true;
socket.on('message', function(data, callback) {
// Do something with `data`
lastKnownMessage = data;
callback(); // confirm we received the message
This is quite similar to the last suggestion, simply without a persistent data store.
You may also be interested in the concept of event sourcing.