Clipboard Copying in Flash 10

One of the things I noticed about the DevFormatter plugin I’m using was that the clipboard copying code was no longer working. This apparently is because of the new security upgrades in Flash 10 which now have additional user-initiated action (UIA) requirements for various functions, including System.setClipboard().

While inconvenient, especially since it’s somewhat commonly used, it was a necessary change due to some high-profile clickjacking attacks, and well, a good idea regardless, when you think about it.

Surprisingly, months after Flash 10’s release, it seems that neither the WP plugins I looked at, nor the most popular general syntax highlighter script seem to have fixed their clipboard functionality, as the workaround isn’t too onerous. Instead of the having a JS triggered Flash copy, just reverse it with a Flash button calling a JS function – not quite as elegant since you’ll need as many Flash buttons as you want copy triggers, but not too onerous. It’s been a while since I’ve done any Flash work, but luckily, it didn’t take very long at all.

Since this might prove useful to others, I’ve done this in AS 2.0 for more compatibility and made the package available here under a GPLv3 license: clipboard_copy.

Of course, if you want to create your own unencumbered version, the code is easy enough to create yourself. The JS call looks something like:

function clipboard_copy(id) {
return document.getElementById(id).innerHTML;
// or instead of innerHTML, you can get plain text:
// [((document.all)? "innerText" : "textContent")]
}

Note, you can have the clipboard JS function act on a selection if you’d like, but for my purposes (integration w/ code blocks, getting the text was better).

The Flash is similarly simple. Here’s the AS 2.0 event attachment and ExternalInterface:

// AS 2.0 Event
copyButton.addEventListener("click",click);
function click(event:Object):Void {
var item = "testid";
var jsFunction:String = "clipboard_copy";
var returnValue:String = ExternalInterface.call(jsFunction, item).toString();
System.setClipboard(returnValue);
}

Easy peezy.

A couple of notes:

  • If you’re testing, you’ll want to run from a web server, not the file system, otherwise you’ll get sandbox errors
  • Regular cross-domain rules also apply of course
  • I used the Button Component for my version, which is admittedly a bit fugly. You could in theory have a text-only Flash link that you subsequently styled w/ JS (i.e., to match font-family and font-size), but I’ll leave that as an exercise for the reader

Printing with Python on OS X

Lately I’ve been doing more application programming. I’ve found that the best way to minimize the pain is to write as little Cocoa/Objective-C as possible. For example, here is a snippet is how you’d create a print job in Cocoa:

- (void)print:(id)sender {
[[NSPrintOperation printOperationWithView:self] runOperation];
}

- (void)printDocument:(id)sender {
// Assume documentView returns the custom view to be printed
NSPrintOperation *op = [NSPrintOperation
printOperationWithView:[self documentView]
printInfo:[self printInfo]];
[op runOperationModalForWindow:[self documentWindow]
delegate:self
didRunSelector:
@selector(printOperationDidRun:success:contextInfo:)
contextInfo:NULL];
}

- (void)printOperationDidRun:(NSPrintOperation *)printOperation
success:(BOOL)success
contextInfo:(void *)info {
if (success) {
// Can save updated NSPrintInfo, but only if you have
// a specific reason for doing so
// [self setPrintInfo: [printOperation printInfo]];
}
}

Yeah, I know, kill me now. My approach towards maintaining sanity and productivity has been to minimize these shenanigans by trying to use as little Cocoa as possible. Also, since I’ve come to [[really] disdain] Objective-C, doing the equivalent in Python was also high on my list. (That being said, I’ve also come to realize that writing Python and PyObjC is actually much worse than Objective-C and Cocoa.) What I really wanted was a pythonic way to print on OS X. The ideal scenario would be to import something and say be able to call printer.print() or something like that.

I wasn’t able to find anything like that. But, while I had originally dismissed lpr, it turned out that that was actually able to do what I needed. On OS X, lpr hooks right up to the CUPS used by the rest of the system and can be passed in not just text files, but PDFs and image files as well. Printing looks like this:

lpr -P [printer_name] file.jpg

There are several helpful commands like lproptions and lprstat that can help you with printer specific options. Here’s the CUPS documentation on Command-Line Printing and Options that’s quite useful.

So, the final code looks something like:

import popen2
popen2.popen4("lpr -P [printer] " + output_file)

Switched to WordPress

One of the things that I reassessed at the end of the year was the likelihood that resuming development on my personal blogging software would be anywhere near the top of my priority list. With the answer being a resounding no, and the advances that WordPress 2.7 has made (particularly w/ the admin interface), it made sense to finally stop putting it off and switch over.

Whipping up a direct database import was pretty straightforward and didn’t take long (less than 80 lines all-told), and I was able to import all 4,149 entries (and over 900 revisions) without a hiccup. I also imported additional the legacy urls and IDs as additional metadata. I ported over my current look into a template, and added my legacy code handling so even my most ancient blogger and other static-based posts should continue to resolve (the 404 checks to see files resolve in legacy.randomfoo.net and I have a few additional redirects that mostly seamlessly forward along the most ancient requests).

The last thing on my plate is figuring out how to best migrate Disqus comments to the new posts. I think that I’ll need to modify the Disqus WP plugin to pull from the legacy_id when applicable. Until then, I’m just redirecting old posts to the old system.

Also, sorry for those subscribing to my feeds for the potential dupe-age. Old posts retain their previous guids, so I guess it’s up to your feedreader to see how it’ll handle the transition…

A few interesting numbers:

  • 2554 posts in Blogger starting in 2000 (not counting my older FA410 class blog from ’99)
  • 1256 posts via text files in vim starting in 2003
  • 1300 posts in a custom db-backed system starting in mid-2004

So, anyone have recommendations on indispensable plugins?

Online Tools for A New Small Business

One of the interesting things I’ve been doing recently has been looking at support tools for running a new company. I remember Ev writing about this a couple years ago. This research was pretty new for me since none of the following services even existed when we started Upcoming (not that we had any need for most of these anyway; we were focused exclusively on building a cool app: our only capital cost was servers [offset by AdSense] and our burn rate was our cost of living).

Anyway, after a day or two of poking around, here’s a list of the top picks (and in some cases, worthy alternatives):

  • Google Apps – a no-brainer for email and document sharing. Unfortunately, while good for individual services, its functionality for even basic sharing is rudimentary to non-existent. Shared documents require manually sharing each document (no shared spaces) and there’s no concept of shared email (for handling shared support, customer service, etc.)

    Price: free

  • Dropbox – Fully integrated w/ on the Desktop, up to 2GB. It just works.

    Price: free

  • FogBugz On Demand – I’ve been using hosted FogBugz for a couple years now. It still has some UI rough edges (although less than JIRA, I suppose) and its Evidence-Based Scheduling is a unique (and awesome) feature. Also, it’ll hook up to email for handling support, which fills in that gap. So, we’re using it for Task, Issue, Effort, and Support Tracking.

    Price: free (2 person Student and Startup Edition)

  • Xero – the international edition (they are New Zealand-based) of this Accounting service was released just a couple days ago, but so far I’ve been incredibly impressed by the functionality and polish. It’s far better than anything else we looked. Besides all the regular banking features, it also does Invoicing and Expense claims tracking. (Reading about the company itself is interesting – I guess there aren’t lots of NZ startups, and the fact that they did an early IPO means all their early growth numbers are public record).

    Price: ~$25/mo (NZ$499/yr)

  • PipelineDeals – after reviewing all the big CRM tools (starting with Salesforce and SugarCRM) I was feeling pretty depressed – they’re all ridiculously bloated, clunky, and just pretty much unusable. I couldn’t imagine being forced to use anything like that on a daily basis. PipelineDeals was a breath of fresh air and supported everything we need for contact tracking as well as providing the best lead/sales tools that I found.

    Price: $15/mo per user

    One alternative worth highlighting is Relenta (Demo l/p:demo). It integrates a shared email system with contact management (it also supports pretty robust email campaigns/newsletters) with support for canned response, auto-responders, role filtering, etc. I remember talking about an app like this w/ some friends years ago, and it’s a great implementation. It wasn’t a good fit for us since we needed something for, well, selling stuff (a surprise, I know), but if your needs are more customer support focused, be sure to take a look at Relenta. I also looked at Highrise, which is slick, but found it to be pretty shallow.

  • MailChimp – although CampaignMonitor is nice, its per/campaign pricing model didn’t make a lot of sense for our use. Mailchimp’s more flexible pricing (which includes monthly pricing) was a better fit, and support for segmentation and A/B testing I guess makes up for individual stats being an add-on. (Vertical Response is another service that has some interesting services like Online Surveys and Snail Mail Postcards, so that might be worth looking into, but at least by my Twitter @replies, MailChimp won out unanimously).

    Price: $10/mo (0-500 subscribers)

Lastly, while Silicon Valley Bank got a lot of love for being the bank for startups, for the day to day business needs (bill/direct payments, business taxes, payroll, merchant account) it looks like Wells Fargo Small Business is a much better fit. Other payroll options include SurePayroll (which used to do WF’s payroll) and PayCycle, although I’m not sure there’s enough of a cost difference to justify the extra hassle. That being said, it might be worthwhile to use Costco/Elavon Merchant Processing.

There are a few other things that we’ll probably end up trying out (UserVoice, GetSatisfaction, maybe some MOO cards) but I think this pretty much covers most (if not all) of our business needs. Anything I’m missing? Or are there any favorite apps/services that people like? Feel free to comment.

See also:

Update: Zoho looks pretty decent as an all-around solution, anyone try it? One caveat I should mention w/ the use lotsa apps approach is that I’ll need to spend a bit of time writing glue code for syncing contacts between the CRM and everything else (most of the tools appear to have decent APIs, but still a bit of a pain).

Python os.walk() vs ls and find

Since I wasn’t able to find a file cataloguer and dupe-finding app that quite fit my needs (for the Mac, DiskTracker was pretty close, I’d definitely recommend that of all the apps I tried), I started to code some stuff up. One of the things I was interested in starting out was how well using Python’s os.walk() (and os.lstat())would perform against ls. I threw in find while I was there. Here are the results for a few hundred-thousand files, the relative speed which was consistent over a few runs:

python (44M, 266173 lines)
---
real  0m54.003s
user  0m18.982s
sys 0m19.972s

ls (35M, 724416 lines)
---
real  0m45.994s
user  0m9.316s
sys 0m20.204s

find (36M, 266174 lines)
---
real  1m42.944s
user  0m1.434s
sys 0m9.416s   

The Python code uses the most CPU-time but is still I/O bound and is negligibly slower in real-time than ls. The options I used for ls were -aAlR, which apparently produces output with lots of line breaks, but ends up being smaller than find‘s single-line, full-path output. The find was really a file-count sanity check (the 1 difference from the Python script is because find lists itself to start with). Using Python’s os lib has the advantage of returning all the attributes I need w/o the need for additional parsing, and since the performance is fine, I’ll be using that. So, just thought it’d be worth sharing these results for anyone who needs to process a fair number of files (I’ll be processing I’m guessing in the ballpark of 2M files (3-4TB of data?) across about a dozen NAS, DAS, and removable drives. Obviously, if you’re processing a very large number, you may want a different approach.

Launching Child Processes with Automator

One of the unresolved issues from my write-up on Firefox 3, Developing and Browsing was that in order to get it work, you’d need to set the Profile Manager to come up on every launch. This of course starts to get old quite quickly (especially since I had already made a separate instance of Firefox.app so that I could have different icons for the apps).

Unfortunately, while there is a simple command switch (-P [profile]) to pick the profile, I couldn’t figure out how to add a command switch to an alias, so I set off on a quest to find the best way to launch these apps…

  • First I tried using a shell script as a ‘.command’ file. This launches commands in Terminal.app, but unfortunately, both launches a new Terminal window and leaves it open once it’s done. Less than ideal.
  • My next series of tests involved using Automator’s “Run Shell Script” functionality, which worked well, except that regardless of what combination of called shell scripts or &s that I added, the Automator App would always wait for the process (Firefox) to finish. That kind of crowding in my applist was something I didn’t need while alt-tabbing.
  • I thought I had some success with ‘Run Applescript’ in Automator (with ‘do shell’) that led me try out some combinations in Automator and the Script Editor (Script Editor Apps are smaller, but slower than Automator Apps. Script Editor Apps also lock up and are one of the few apps that have the old B&W spinner instead of a beach ball).
  • Finally, I asked around to see if anyone else had tried this before and rcrowley gave the winning answer, which was to give up and write something that would exec a child process. He suggested pcntl_fork in PHP, but I went w/ Python (just because :).

So the end result are two Automator Apps that each contain a single “Run Shell Script” command:

python -c "import popen2; popen2.popen4('/Applications/Firefox.app/Contents/MacOS/firefox -P default')"
python -c "import popen2; popen2.popen4('/Applications/FirefoxDev.app/Contents/MacOS/firefox -P developer')"

They’re named ‘FF.app’ and ‘FFDEV.app’ respectively for easy Quicksilver access, and icons were copy and pasted (through Get Info selection). That took way too long, but it does work as exepected (they launch, and then get out of the way), so hopefully this writeup helps other people that might be looking to do something similar with Automator.

Also, please leave a comment if there’s a dead simple way to do this that I couldn’t figure out.

Kindle: +1 Week RFEs

Busy with pre-moving tasks, but thought I’d post a quick followup on the Kindle. I’ve bought 3 book so far (I’m keeping a spreadsheet, so far I’ve saved $33.22, or 47.43% off of buying the physical books off Amazon – one of them was also out of stock, so that was an extra bonus; When I get a chance I’ll have to compare the book buying rate to the past year). I also sent out an email to kindle-feedback. Here are the points of improvement I included (specifically software, and not the industrial design, which I’m sure they’ve heard ad nauseum):

  1. One of the first things that I did even before getting the Kindle was to queue up a bunch of samples. This is great, but even with this limited amount of titles, it’s pretty hard to find the title I’m looking for. My first set of suggestions are all related to library management:
    • A smarter dashboard style listing would be nice. For example, you could have the Home screen be split in two, with a “Recently Read” and “Newly Arrived” listing. Paging to the next page would get you a traditional listing.
    • Although, while full-text indexing may be out of the question, allowing searching/filtering by limited metadata from the “Show and Sort” menu, or, if there’s a dashboard, having a search/filter box accessible at the bottom of the first page would be a great way to allow a user to quickly find a title from a large (100+ volume) library.
    • Along the lines of organizing a bookshelf (the potential storage capacity, even without any additional SD storage far outstrips the current Content Lister’s ability to manage), a number of improvements would make things better:
        Tagging of titles (and allow listing by tag/section, filtering by tag)

      • Archiving – for example, read books
      • Read % / Status – related to the former, but being able to filter or organize by which books you’re currently reading, haven’t started, and have finished – the metadata is all there, but it’s not being displayed
    • Along the lines of metadata and display, the current separation of listing and managing seems unnecessary. One alternative, especially if you add a second smaller line that contains status and other metadata is to have each book have two click areas (the current 3-segment tall title, which remains the same – clicking opens the book, and a second 1-2 segment tall status line which brings up a context line — note, this space already exists, so it wouldn’t even affect the # of books that could be listed by much…)
  2. A related request would be for storage of a reading journal — this data is stored by the device (it autobookmarks and knows which books were last opened, how long, etc.) and, at least according to the Kindle TOS is being reported to Amazon.com. It seems like a big opportunity is being missed by not having a user-accessible journal (the Wii is a good example of what this might look like to the end user).
  3. Although I’m not a fan of DRM, I really like what you guys are doing with the media management of purchased books. This is very compelling, although I’m disappointed that it doesn’t extend to periodicals. There are some periodicals I’d be open to subscribing to (any hope of getting The Economist?), but that’s definitely a sticking point to me. I like to annotate and file articles of interest – the latter functionality doesn’t seem to exist at all, and the former works, although it’s too bad that there’s no way to better manage the annotations or get it off the device wirelessly.
  4. In terms of legibility, if there were different fonts or line-height adjustment, that’d be quite welcome. This is especially noticeable w/ the experimental web browser.
  5. I very much like the ability to make annotations, especially when reading technical papers, essays/articles (unfortunately, the conversion process is somewhat lackluster/tedious – when I tried sending an HTML file to the kindle.com address, it converted it as plain text (tags in the page galore), and since I’m on a Mac, I had to use a third party toolchain (Mobiperl). Err, in any case, my suggestion for annotations is fairly simple – when viewing/editing an annotation, it currently requires a second click to show it. I can (somewhat) understand a second click to edit, but wouldn’t it be better to just show the note (and menu) when one clicks on a line w/ a note?
  6. Along the lines of notetaking, I’ve taken to carrying around the Kindle when I’m out and about – there’s lots of times where it’d be useful to use it to type a quick note, but there isn’t any way to do that in a standalone manner. Lists are another potentially useful app, which leads me to ask…
  7. Is there any particular reason there isn’t an SDK available? Is there one planned? It seems like there’s a lot of potential for Kindle’s functionality to be extended, whether in terms of additional apps, or for things related its core capabilities. I can think of a half dozen things off the top of my head that would do a lot, I think, to help get a random person to plunk down $360 on the device. The e-book space is littered with devices that require enormous amounts of low-level effort just to get to a point where useful apps can be developed (these, of course are very different skillsets, so rarely has anything exciting to end-users ever happened). It seems like the Kindle is well positioned to be different in this regard. I know there are potential pitfalls (although, having been intimately involved in making similar design decisions [open APIs and web services], somewhat overblown since it’d be easy enough to control via dev keys or just by the fact that without easy/automatic distribution, the userbase is self-limiting), but I believe the rewards are manifold, and I hope you guys at least give it a try.

There’s one additional issue that I didn’t mail in that’s been getting on my nerves – when buying a book, it comes down the pipe quite quickly, and it’s a simple (almost one click) process that you go through once you get to the end of the sample, but it doesn’t replace the sample chapters, and in fact starts you off all over again. IMO, the ideal experience would be to have some additional pages unlocked so you can continue reading, then, when the full book has finished downloading, to port your annotations, remove the sample file, and open the full book at the location where you left off from the sample. True that kind of polish is typically missing from 1.0 products, but it’s usually the difference between the magical product you love and… well, everything else.

OK, I Got a Kindle

Over the weekend, I broke down and ordered a Kindle (which arrived today). There are lots of good reasons not to get one. Heck, I wrote a screed about it myself last year. (What? Speak up, I can’t hear you over the cognitive dissonance.)

So, why’d I end up getting one? Ironically for a “gadget” purchase, it was the practical aspect that finally pushed me over: I’ll be out of town the next few months and it’ll be inconvenient and impractical for me to buy/store books, or have access to my bookshelf.

While I’m strongly against DRM, I’m also a big proponent of what Amazon is doing with their yourmedialibrary initiative. Anyone whose heard my spiel on digital media knows that I’m a big proponent of media management as a primary value-add that makes paying for digital media worthwhile. As we accrue more and more digital stuff, having a convenient service that stores, tracks, organizes, and delivers it when and where we want it is going to be increasingly important (and necessary).

I have a lot of books that I really like (and that are quite nicely formatted and probably won’t be replaced anytime soon by eBooks) but looking at the couple hundred volumes on my bookshelves, I’m having a hard time finding many that have truly sentimental value. I think at the end of the day, I could cut down my shelf by at least two-thirds, maybe more. The upshot, besides much easier future moving, is that I’d probably use the books much more when the text of my library is fully searchable and easily annotable.

(Obviously, this will probably be different for everyone, but I think more and more will start thinking like this, especially as digital music and video take over. I have about 100 DVDs. None have been touched in months. And the only time I touch the albums I’ve bought are to rip them.)

Kindle and iLiad

And now for some talk about the devices. This will be somewhat more of an iRex exit review than a Kindle review (since I just got the latter), but irrespective, I think the former will give some insight into what I’m looking for and expecting of the Kindle.

In terms of the actual reading experience, having had the iLiad e-ink device since its release (Summer 2006), I knew what to expect of the screen. In comparison, the Kindle’s screen is smaller (6″ vs 8″ diagonal), very slightly denser (167ppi vs 160ppi), and has worse grayscale (4 vs 16 shade). It is slightly faster refreshing and a little brighter (40% reflectance vs 32-35%) thanks to a newer Vizplex screen, but overall it’s very similar. The serifed font on the Kindle is heavier and wider, but also better hinted than the iLiad, so while it fits even less text on the page, it may be a bit more legible. If you’ve never seen an e-ink screen, it’s really worth doing. You don’t really won’t understand the fuss until you do. It’s much easier on the eyes than any backlit display, and much more “solid” than any reflective LCD. It’s a flat matte plastic that’s hard to describe. The closest thing I can liken it to is that it looks like the fake screns on the computer stand-ins in office furniture displays.

The iLiad supports more formats, of particular interest being PDF (it runs a modified version of xpdf) and has had a fair amount of hacking done to it. It also has built-in wifi. Unfortunately, a number of issues conspire to make these advantages moot. (Actually, there’s one main one which I’ll get to last.)

Even though the screen is larger than the Kindle’s, it’s still comparatively small (about A6) so A4 PDFs aren’t very legible (the zooming doesn’t work well). This means that it’s not very good for reading technical papers on, and that most real reading (books, etc.) need to go through a reformatting/conversion process. If you’ve dealt with PDFs, you know how difficult that can be, since PDFs aren’t semantic, but layout based by nature. HTML files are an option, but the built in browser doesn’t paginate (or remember your position, or font size for that matter), so if you’re looking to read a book… well, good luck. And while the wifi sounds great in theory, in reality, there’s never been any way to load documents on wirelessly.

All these (and the many other design flaws, both in the hardware and software) could be overlooked or worked around if not for the one major, MAJOR flaw that made the iLiad useless for me – it never had any working power management. That’s right, no sleep, suspend, or hibernate. The lowest power screen in the world (which, come to think of it, these e-ink screens are) doesn’t help one bit in that case. Despite many promises to the contrary, iRex has never been able to address that problem.

Now, granted, as an early adopter, I don’t expect things to always work, but unfortunately, despite the original claims of long battery life (made in page turns, with no hint that it’d be constantly sucking juice), the device barely makes it through a few hours, not even a full day. This is a bit mystifying considering the success that OLPC and Amazon have had with instant suspends. Even worse, there’s no sleep or hibernate, so a full power cycle is required before reading. Surprisingly, they’ve released additional products (presumably aimed at real consumers) that haven’t addressed the problem at all.

To give you an idea of what this means: the iLiad took 49 seconds to boot up, and then another 14 seconds to load up the PDF. That’s over a full minute just to do the equivalent of opening a book up. I don’t think they mention that in the “features” section of their marketing. Considering that the average cell phone wakes up instantly, and heck, my laptop is up in 5s, this failing is really just incomprehensible to me.

This aspect of course was the Kindle’s easiest sell. The reviews and reports give it an average of 4-5 days of battery-life w/ the wireless off, and 1-2 days with it on. More importantly, resuming from suspend to where you left off takes between 3-4 seconds. That’s not too shabby (opening a new book from the menu also takes about 3-4 seconds). That there is basically the difference between a daily-use device vs. an over-expensive toy.

In terms of data loading, the Kindle has both the email gateway which I’ve tested, and is certainly convenient (after giving it some thought I’m pretty sanguine about using it since I’m pretty sure that the liability implications of keeping/tracking the files sent trump any value they might get from storing it for future data mining), and it simply mounts as an external drive when connected via mini-USB (another failing of the iLiad is its ridiculously large and awkward dongle attachment for power, USB, and network connectivity).

While there is no official Mobipocket software for the Mac, there is an alpha version of a linux tool, and more importantly, an open source set of tools called Mobiperl that seems to work well.

All in all, it’s doubtful that I’ll ever touch my iLiad again (well, we’ll see how OpenInkpot does), but from my limited time playing around with the Kindle so far, it looks like it should do the job that the iLiad never could.

Which isn’t to say it’s perfect. Even with my limited usage, it’s obvious there’s definitely lots that could be improved (for example, the content lister is pretty impossible for organizing anything close to the storage limit – it’s just a straight file listing with no ability to organize (tag, search, look up) or way to keep track of the the read/unread status). And yes, the industrial design is heinous – even ignoring the aesthetics, it’s pretty much impossible to pick it up without accidentally turning the page (death by 700ms cuts?). And it’d be nice if there was a way to open up or work on the device itself (igorsk has been the only person who’s done anything of note so far), but for now, I’ll be happy with having a device that should be usable for what I got it for.

Rearchitecting Twitter: Brought to You By the 17th Letter of the Alphabet

Since it seemed to be the thing to do, I sat down for about an hour Friday afternoon and thought about how I’d architect a Twitter-like system. And, after a day of hanging out and movie watching, and since Twitter went down again while I was twittering (with a more detailed explanation: “Twitter is currently down for database replication catchup.”; see also) I thought I’d share what I came up with — notably, since my design doesn’t really have much DB replication involved in it.

Now, to prefix, this proposal is orthogonal the issue of whether statuscasts should be decentralized and what that protocol should look like (yes, they should be, and XMPP, respectively). That is, any decentralized system would inevitably require large-scale service providers and aggregators, getting you back to the architecture problem.
So now onto the meat of it.

As Alex’s post mentions (but its worth reiterating), at its core Twitter is two primary components: a message routing system, where updates are received and processed, and a message delivery system, where updates are delivered to the appropriate message queues (followers). Privacy, device routing, groups, filtering, and triggered processing are additional considerations (only the first two are currently implemented in Twitter).

Now this type of system sounds familiar, doesn’t it? What we’re looking at most closely resembles a very large email system with a few additional notifications on reception and delivery, and being more broadcast oriented (every message includes lots of CCs and inboxes are potentially viewable by many). Large email systems are hard, but by no means impossible, especially if you have lots of money to throw at it (*ahem* Yahoo!, Microsoft, Google).

Now, how would you might you go about designing such a thing on the cheap w/ modern technologies? Here’s the general gist of how I’d try it:

  • Receive queue
    • Receive queue server – this cluster won’t hit limits for a while
    • Canoncial store – the only bit that may be DB-based, although I’d pick one of the new fancy-schmancy non-relational data stores on a DFS; mostly writes, you’d very rarely need to query (only if you say had to check-point and rebuild queues based on something disastrous happening or profile changes). You’d split the User and Message stores of course
    • Memory-based attribute lookups for generating delivery queue items
    • Hooks for receive filters/actions
  • Delivery queues – separate queues for those w/ large followers/following), separate queues also for high priority/premium customers
    • Full messages delivered into DFS-based per-user inboxes (a recent mbox, then date-windowed mboxes generated lazily – mboxes are particularly good w/ cheap appends)
    • Write-forward only (deletes either appended or written to a separate list and applied on display)
    • Hooks for delivery filters/actions (ie…)
  • Additional queues for alternate routing (IM, SMS delivery, etc) called by deliver hooks
  • The Web and API is standard caching, perhaps with some fanciness on updating stale views (queues, more queues!)

Note that this architecture practically never touches the DB, is almost completely asynchronous, and shouldn’t have a SPOF – that is, you should never get service interruption, just staleness until things clear out. Also, when components hotspot, they can be optimized individually (lots of ways to do it, probably the first of which is to create buffers for bundled writes and larger queue windows, or simply deferring writes to no more than once-a-minute or something. You can also add more queues and levels of queues/classes.)

The nice things about this is that technologically, the main thing you have to put together that isn’t out there is a good consistently hashed/HA queue cluster. The only other bit of fanciness is a good DFS. MogileFS is more mature, although HDFS has the momentum (and perhaps, one day soon, atomic appends *grumble* *grumble*).

Now, that’s not to say there wouldn’t be a lot of elbow grease to do it, especially for the loads of instrumentation you’d want to monitor it all, and that there aren’t clever ways to save on disk space (certainly I know for sure at least two of the big three mail providers are doing smart things with their message stores), but creating a system like this to get to Internet scale is pretty doable. Of course, the fun part would be to test the design with a realistic load distribution…

Blueball-o-rama

After the craziness last week (Where 2.0, WhereCamp, and a GAE Hackathon in between), I was looking forward to taking a breather, but have instead jumped headlong into working on some much delayed music hacking and getting serious (along with self-imposed deadlines) with the Objective-C area. I’m also catching up on publishing stuff from last week, so here’s the summary of my Bluetooth work.

nodebox.py
As Brady mentioned in his Radar writeup, Blueball came primarily out of discussions on how to do an interesting Fireball-like service/tool for a single-track, stuck-in-the-middle-of-nowhere (sorry Burlingame) conference. Also, my desire for Brady to say “blueball” on stage. (score!) Fireball and Blueball are also both parts of a larger track that I’m exploring. It may take a while, but hopefully something interesting will start emerging.

I had a session on Proximity and Relative Location at WhereCamp, where I stepped through my code (really simple collectors, very half-assed visualizations running a simple spring graph) and talked a bit about the useful things that can be done with this sort of sensing.

The particularly interesting bits (IMO) are in applying the type of thinking that was being done on Bluetooth scatternets a few years back on patching together “piconets.” That is, by stitching together the partial meshes, you can get very pull out all sorts of transitive (inferred) properties. There are of course visualizations and pattern extraction you can do on that, but by matching the relative with the absolutes, you can get far wider coverage for LBS and related services. And of course, you can do your own reality mining on social connections when you start relating devices to people.


blueball v1 from lhl on Vimeo.