Design Youtube – Neetcode

Notes and highlights.

  • 01:30 make sure to only pick a few functional requirements to focus on.
    • we will pick uploading and watching videos.
  • 02:06 non-functional requirements.
    • uploaded videos should ensure that they don’t disappear or get corrupted.
  • 02:52 back-of-envelope estimations (note that this isn’t super important).
    • assume one billion DAU.
    • 1% of users upload videos, and the average user watches 5 videos per day.
    • assume that most videos do not get views.
    • 500 videos uploaded per second.
  • 04:19 favor availability over consistency.
    • we want the page to always load, but users can sometimes wait a bit for a new video to show up.
    • different considerations for livestreams, but out of scope.
  • 05:26 minimize latency.
    • start playing the video before the entire video is loaded.
  • 05:47 high level design for user uploading a video.
    • load balancer with a bunch of app servers.
    • we will store the raw video files in some object store like AWS S3, they will handle database-replication for us.
    • videos should also have metadata that contains the title, description, author, etc.
      • this data will be stored in a nosql-database database like MongoDB, which contains a reference to the video file in the data store.
      • can be keyed by the video id.
    • 11:23 we need to encode the videos to compress/normalize them.
      • this is a non-trivial operation that will take time.
      • use a message-queue to have a separate service encode the videos using asynchronism.
        • how many workers should we have?
          • we need more than 500 workers because assuming we have 500 videos uploaded in a second, the videos uploaded in the next second will not have any workers to encode those.
          • so, we need way more than 500 workers.
      • send these encoded videos to object storage.
  • 13:06 caching to improve read speeds.
    • we can use a content-delivery-network to distribute the video geographically across the world.
    • we can use caching for our metadata database to improve read speeds for users who are trying to watch videos.
  • 18:08 explanation of the way that youtube loads chunks of video at a time instead of using streaming.
    • this is how we lower latency by chunking our data.
    • this is known as video streaming, but we’re not exactly livestreaming, which is different.
  • 23:34 comparisons with real youtube.
    • YouTube doesn’t actually use a nosql database, but uses mysql.
      • mongodb didn’t exist when youtube was starting.
      • they tried database-replication and sharding to try to scale.
      • they then created vitess to decouple the application layer from the database layer.