design leaderboard

my solution

Let’s assume that this is a leaderboard for a very popular online video game with 100,000 DAU, 1M MAU.

See the players at the top of the leaderboard.
- Potentially allow filtering by region.
See leaderboard for friends.
See position of user in leaderboard as well as surrounding players.
Leaderboard should update with new scores that are set by players.

Read-your-writes consistency - a user should see their updated score on the leaderboard after they achieve it.
- It’s fine to just have basic eventual consistency for other users’ scores.
Higher write volumes, but read latency should be lower.
- More people will be playing the game/setting scores vs. just perusing the leaderboard.
Durability, consistency.
A skewed number of read requests will be for the top 10/top 100 of the leaderboard.

100,000 DAU, but only a smaller percentage of these players achieve scores that beat their best.
- Let’s say 20%, or 20,000 daily writes are needed to update the database.

GET /leaderboard?limit=50&offset=0

GET /leaderboard?filter=[list_of_friends]

GET /leaderboard?user=[user_id]

PUT, POST /leaderboard/:user_id

Standard game and server set up, but I made some changes to achieve our functional requirements with some more efficiency.
Cache for ”hot” queries.
- Common queries, like top 100 scores in the world, will be accessed a lot, so instead of requiring a DB call for each of them, we can cache the result of that query, and update the cache when a new score arrives that beats a score in the old query.
We can create a btree index on the scores table, so that “range queries” for scores near a user’s score will like close-by in the index.
- meta: is this even how btree indexes work with range queries?
- answer: yes it is.

The cache will store the results of frequently made queries like the top 100 in the leaderboard.
We can make use of stream processing such as using a change data capture to have the cache subscribe to writes on the database, and update necessary cached queries when a new score would change the result of that query.

We could replicate the database, dealer’s choice of single-leader or multi-leader replication, depending on scale, if we have multiple datacenters.
- We would need to have some sort of causal consistency in-place to ensure that users could see their new scores in the leaderboard (and maybe their friends as well). Eventual consistency for everyone else’s scores would be fine.

I didn’t really understand the problem as it’s normally described, and probably could have focused more on aggregation of singular events to create a “top k” leaderboard instead of just individual game scores.
Didn’t consider the possibility/requirement of fetching high scores within a time window.

For this problem, it was better to build up from a base knowledge of data structures/algorithms, such as top k elements/min-heap.

The image shows four possible solutions on the right, each with similar ingestion pipelines on the left.