Matt just posted about his recent Adventures in being a bandwidthaholic and made good points. Towards the end, he talks about solving the distributed bandwith problem, and for legal files, there is a sort of a solution already in the form of FreeCache. Here’s how it works:
This is pretty nifty because it’s completely transparent to the end user, and it beats regular proxying because it still makes a HEAD request to the original source which allows them to track popularity and for FreeCache to update changed content and dump removed content.
That being said, the current ‘cache client’ is less than ideal for those who have bandwidth they want to contribute. Here are some features that I think should be added (also, hopefully the redirector software is open-sourced eventually).
- flexible bandwidth account rules – I think this is the biggest strike against Freecache currently. Most people are willing to contribute bandwidth, but need to make sure that they stay under x amount of quota. This *could* be done w/ mod_throttle, but it’d be better if it didn’t have to be since there are probably a lot more people that would be willing to throw a set and forget script on their server than try to customize their server for this (and it’d be pretty trivial to program naive bandwidth accounting)
- domain/url [regex] white/black-listing, grouping – this would probably be the second biggest win — I think a lot of people would want to contribute for only specific sites, groups. Combined w/ the bandwidth accounting, at the simplest level: I am willing to cache x amount for community site 1, and y amount for everything else, and I never want to cache stuff for z. The central redirector has to handle the heavy lifting anyway, so what’s one more thing to take care of?
- connection limiting, bandwidth shaping – this is sort of a subcategory of point 1 — but it’d be hard to do (well the connection limiting not so much so, but the bandwidth shaping definitely), lets save that for version 2
- distributed discovery – you could seed the list and then have the caching clients do their own rule parsing and redirection. In fact you could do all kinds of offloading for distribution. Err, congratulations, you’ve just built a server to server P2P network.
I have some other ideas about what would be neat for mirroring systems, but those aren’t as useful to the net-at-large.