random($foo)

Rearchitecting Twitter: Brought to You By the 17th Letter of the Alphabet

Since it seemed to be the thing to do, I sat down for about an hour Friday afternoon and thought about how I'd architect a Twitter-like system. And, after a day of hanging out and movie watching, and since Twitter went down again while I was twittering (with a more detailed explanation: "Twitter is currently down for database replication catchup."; see also) I thought I'd share what I came up with -- notably, since my design doesn't really have much DB replication involved in it.

Now, to prefix, this proposal is orthogonal the issue of whether statuscasts should be decentralized and what that protocol should look like (yes, they should be, and XMPP, respectively). That is, any decentralized system would inevitably require large-scale service providers and aggregators, getting you back to the architecture problem. So now onto the meat of it.

As Alex's post mentions (but its worth reiterating), at its core Twitter is two primary components: a message routing system, where updates are received and processed, and a message delivery system, where updates are delivered to the appropriate message queues (followers). Privacy, device routing, groups, filtering, and triggered processing are additional considerations (only the first two are currently implemented in Twitter).

Now this type of system sounds familiar, doesn't it? What we're looking at most closely resembles a very large email system with a few additional notifications on reception and delivery, and being more broadcast oriented (every message includes lots of CCs and inboxes are potentially viewable by many). Large email systems are hard, but by no means impossible, especially if you have lots of money to throw at it (*ahem* Yahoo!, Microsoft, Google).

Now, how would you might you go about designing such a thing on the cheap w/ modern technologies? Here's the general gist of how I'd try it:

  • Receive queue
    • Receive queue server - this cluster won't hit limits for a while
    • Canoncial store - the only bit that may be DB-based, although I'd pick one of the new fancy-schmancy non-relational data stores on a DFS; mostly writes, you'd very rarely need to query (only if you say had to check-point and rebuild queues based on something disastrous happening or profile changes). You'd split the User and Message stores of course
    • Memory-based attribute lookups for generating delivery queue items
    • Hooks for receive filters/actions
  • Delivery queues - separate queues for those w/ large followers/following), separate queues also for high priority/premium customers
    • Full messages delivered into DFS-based per-user inboxes (a recent mbox, then date-windowed mboxes generated lazily - mboxes are particularly good w/ cheap appends)
    • Write-forward only (deletes either appended or written to a separate list and applied on display)
    • Hooks for delivery filters/actions (ie...)
  • Additional queues for alternate routing (IM, SMS delivery, etc) called by deliver hooks
  • The Web and API is standard caching, perhaps with some fanciness on updating stale views (queues, more queues!)

Note that this architecture practically never touches the DB, is almost completely asynchronous, and shouldn't have a SPOF - that is, you should never get service interruption, just staleness until things clear out. Also, when components hotspot, they can be optimized individually (lots of ways to do it, probably the first of which is to create buffers for bundled writes and larger queue windows, or simply deferring writes to no more than once-a-minute or something. You can also add more queues and levels of queues/classes.)

The nice things about this is that technologically, the main thing you have to put together that isn't out there is a good consistently hashed/HA queue cluster. The only other bit of fanciness is a good DFS. MogileFS is more mature, although HDFS has the momentum (and perhaps, one day soon, atomic appends *grumble* *grumble*).

Now, that's not to say there wouldn't be a lot of elbow grease to do it, especially for the loads of instrumentation you'd want to monitor it all, and that there aren't clever ways to save on disk space (certainly I know for sure at least two of the big three mail providers are doing smart things with their message stores), but creating a system like this to get to Internet scale is pretty doable. Of course, the fun part would be to test the design with a realistic load distribution...

2008-05-24 23:13:59