Thus, during the process heartbeat step, we create this delayed trigger if it doesn’t yet exist for the member. If it already exists, we simply reset it to fire in another d + 2ε seconds. When the delayed trigger for an online member fires, we check whether the member’s heartbeat has expired in our K/V store. If it has, we publish an offline event on the presence status topic for that member on the Real-time Platform to distribute the fact that the member has gone offline to the member’s connections.
You might be wondering how this scales out horizontally. Note that it doesn’t matter which node running the Presence Service receives a member’s heartbeat. The common state is kept only in the distributed K/V store, which will allow any node to do the exact same processing for a given member. However, to prevent duplication of the delayed triggers for a member on multiple nodes, we do a best-effort sticky routing of a given member’s heartbeats to a given node. This is done using the d2 load balancer at LinkedIn, which supports a load balancer strategy that can hash the member ID in the heartbeat request to route it to the same node.
Delayed triggers using Akka Actors
On LinkedIn, thousands of members go online or offline per second. There are millions of members that are online at any given moment. Since we create a delayed trigger for each online member, we needed a truly lightweight solution.
Akka is a toolkit for building highly concurrent, distributed, and resilient message-driven applications, and it works well with the Play Framework that we use at LinkedIn. Actors are objects which encapsulate state and the behavior defining what they should do when they receive certain messages. Each Actor has a mailbox and they communicate exclusively by exchanging messages. An Actor is assigned a lightweight thread when available, which reads the message from the mailbox and alters the state of the Actor based on the behavior defined for that message.