Streaming Tar Files the Right Way in PHP

Tonight I was working on on-the-fly tarring of files, and in the spirit of not-reinventing-the-wheel, a quick search turned up a just written article, Creating ZIP and TAR archives on the fly with PHP. Great!

Except that when you think about it, TAR (that stands for Tape ARchiver) is built for streaming. It just doesn’t make any sense to create a temp file and then send it – that seems like a waste (unless you were streaming the same file many, many times or using a separate set of servers for sending static files). If you’re going to be sending a file with the same Apache processes, here’s the right way (well, my right way, YMMV) of on-the-fly creation:

// Unlimited execution, as long as it takes to DL
set_time_limit(0);

// For proper browser handling
header('Content-type: application/x-tar');
header('Content-Disposition: attachment; filename="' . $filename . '.tar"');

// Make it safe
$filename = escapeshellarg($filename);

// the C argument is so that doesn't get included in your tarball extraction
$cmd = "tar cC /path/to/tar/from $filename";

// teh magic
$fh = popen($cmd, 'r');
while (!feof($fh)) {
  print fread($fh, 8192);
}
pclose($fh);

Unicode/UTF-8 Notes: I18N Gotchas

Upcoming.org was originally launched on PHP4 and I believe, MySQL 3.23. As you might imagine, internationalization (i18n) wasn’t at the top of the priority list. As time has gone on, more and more international users have started to use Upcoming. With global geocoding and time zones off our plate, it was time to tackle character encoding.

Now, everyone knows that that i18n is easy: just use UTF-8 across the board and you’re done. The reality of course, is more complicated. While I don’t think there’s any disagreement that Unicode is a Good Thingâ„¢, even a decade and a half in, support is still wildly uneven in different environments and programming languages. No doubt these notes will be out of date soon, but I’m writing this down so I know where I can pick up next time.

Development Environment

The first thing to do is to make sure that you’re working in a UTF-8 aware environment. Browsers are good with encodings (a little too good with character-set autodetection, but you can force them into whatever display you want, more later). OS X is Unicode friendly and Terminal.app is a good place to start. Linux is also pretty good, you can check what you’re locale settins are w/ “locale” – RHEL defaults to en_US.UTF-8, so display in xterms actually work out of the box (although the Unicode fonts seem to be missing lots of codepoints).

Beyond your locale, you’ll also want to make sure that your pager can display UTF-8. You can export LESSCHARSET=utf-8 into your bashrc.

Lastly, and for me, the most important tool is Vim. Version 7 has improved unicode support, but for me, v6.1 worked fine in binary mode. Being able to edit a file (vim -b), move over to the character, and type g8 to get the hex code points was the easiest way for me to verify transcoding/character set issues.

MySQL

We’re now using MySQL 4.1.x, the later builds which have pretty good character support. There are still outstanding collation issues, and things like UTF-8 corruption in InnoDB for <4.1.16 and character_set_name problems <4.1.19 are troubling, but overall, once you the bajillion character_set variables to "utf8", thing seem to work alright. Since we remained in latin one when we went from 4.0 to 4.1, we avoided some big headaches, especially since we actually were storing many different character encodings (MySQL 4.0 and before simply treated values as binaries, so this wasn’t a problem). A couple of alters later and we’re now in utf8/utf8_general_ci with a mixed-bag of binary data (untouched by the alters).

Python

Of the languages that we use, Python has the best unicode support (just watch out for the wacky print behavior; another howto). That and Mark Pilgrim’s chardet package made it a no-brainer to do character set conversions in Python. Unfortunately, I ran into some hairiness with trying to get MySQLdb to write UTF-8. That eventually got solved (if you’re running a setup w/ broken set_character_set() then force it through the my.cnf), I worked around that problem by sending the UTF-8 through the mysqlclient.

Perl

Perl has had, um, issues with Unicode. 5.8 apparently fixes lots of things – too bad I’m running 5.6.1. Still, I didn’t have too many problems with it. The localized scope utf8 pragmas are sort of weird, but seem to work as advertised (so glad I coded all my string handling in a modular manner). The one problem I had with DBI not returning UTF-8 was solved with a quick search – interestingly, SET NAMES worked perfectly for DBI even when it did bupkiss for MySQLdb.

PHP

PHP had potentially the worst Unicode handling, but PHP5 improves things a lot, with native UTF-8 support (just remember to set it as the default character set). Since the stuff I was working on didn’t require string manipluations at all, I completely lucked out here. The only thing I needed to fix was a stray utf8_encode() call that was crushing the UTF-8.

Summary

So, ironically, I ended up having the most problems with the language with the most complete Unicode support. This whole deal was quite the PITA, but on the bright side, it’s done, and the everything seems to have squeaked by with at least a passable level of workingness. (I’m not sure this would have worked so “smoothly” even as late as last year.)

Nintendo DS Lite Mania!

Recently with the impending US release, I finally got the Nintendo DS bug. I don’t know what tipped it over for me (probably SylphIRC), but it looks like there’s some momentum at the office. Ed, Kevin, and Eric are all getting NDSLs.

To be able to run homebrews, emulators, backups and the like, you’ll need two things. First you need a passkey to boot into DS mode. The latest and greatest is the Max Media Launcher, which replaces the older bulky passkeys with a DS cart sized (but non-reprogrammable) replacement. (You can subsequently go through a flashing process (link to FlashMe to bypass the need for a key, but it seems like sort of a bad idea for the DS Lite at the moment). In any case, I bought mine for $26 shipped from the UK.

UPDATE: It looks like the MML is buggy with sleep mode. To get around this you can swap in an NDS cart after boot up. Alternatively, the Passkey3 and Super Key are both similar NDS cartridge sized launchers that are supposed to be coming out soon.

The second thing you’ll need is a memory cartridge / player. The best at the moment seems to be the ones made by M3 (100% compatibility, skinnable interface, media playing, lots of emulators built in). The Mini-SD version is very slightly shorter than the SD version, but SD cards are cheaper and come in bigger sizes (and since I have SD and mini-SD cards already, it makes more sense to have something that can take both). While the newest SD SuperCard is smaller (GBA-cart sized) I figure I wouldn’t be bothered too much since GBA carts protrude from the DS Lite already anyway. What’s another centimeter or two.

Here are some links of the things I’m most excited about:

Screencasts Tutorials

Lately, screencasting has been gaining steam on my internal radar (Gordon’s working on a very cool side-project right now). I just made some since it seemed to be the easiest way to show my mom how to do some basic computing tasks (It took about 20m to make 3 short ones).

I wonder if there’s a collaborative screencasting/tutorial site? It could actually just overlay on top of YouTube for storage, etc, with some additional metadata/filtering attached…

Lunix Tech Tips

Almost every engineer at Yahoo! gets a *NIX workstation in addition to a PC/laptop (in my case, I requested a Linux box instead of the more traditional BSD, and a Powerbook). While KVMs come standard, a fair number use their the workstations almost exclusively headlessly for local development, me among them. I’ve never been a fan of X Windows (can’t I just have working mouseless copy and paste for all my applications?), and my life seemed like it was fine without it.

Yesterday, I couldn’t log in through my laptop, and I decided to bite the bullet and finally try to get X working for me. I’ve actually made a lot more progress than I thought I would, and I’ve learned some interesting new things (that I’m writing down so I don’t forget), but this experience has served to confirm my previous assumptions that my life will continue to be fine without interacting with UNIX on the desktop.

  • 1920×1200 on 2405FPW – One problem that plagued me was that I couldn’t get the 2405FPW monitor running at native resolution in X. As far as I knew, it should have been working, but it wasn’t. I finally tracked down a lead that I had missed, and after downloaded and compiling read-edid, I found out the missing ModeLine arguments, and also had to correct the HorizSync settings in the Monitor Section of the xorg.conf
  • RHEL4 up2date sucks – Many of the problems I’ve had wouldn’t be issues in Debian. Up2date reads YUM repositories, so I added the Fedora Extras in. (RHEL4 is based off of FC3) It’s not perfect, and some of the packages just fail (or have unresolved dependency trees), but it’s an improvement
  • Quicksilver-like tools – I found a couple, but ended up switching window managers (for a bunch of other reasons) so I never got to try it out. ion, the window manager I’m now using lets you do keyboard binds and scripting, which is good enough for my purposes, even if it’s not very slick
  • A better window manager – so, being fed up with how I couldn’t have a decent copy-and-paste experience (1 set of keyboard shortcuts across all applications – honestly, can it be that hard?) I set off to find something to solve my clipboard woes… and I haven’t found it. I did however try a bunch of window managers (on my list: tiling w/ arbitrary window splitting and sizing, remembering layouts, full keyboard access, customizability). Ratpoison and wmii proved too limiting, but ion seems to be almost there with the other window manager features I was looking for. It has built in Lua for high-level scripting of behaviors and arbitrary keybinding, so if I had a couple days to spare (which I don’t), I could probably get it near how I think a window manager should actually work. Of course, it falls down on the copy and pasting, but maybe I can find a third party app to do what I want.
  • RHEL4 window manager switching – RHEL4 uses gdm to handle X Windows logins. It’s really bizarre. There are lots of config files lying around, none of which seem to work in actually displaying a third-party window manager in the selection list. After lots of searching, I found that that the script I wanted to access was switchdesk-helper. I then added custom branches for my window managers and the appropriate switchdesk files. I suspect there’s a better and easier “correct” way to do it, but I know better than to assume that’s the case…
  • Reversing mouse buttons – One of the things employees go through arriving at Yahoo is a full ergo review. Since then, I’ve been mousing w/ my left hand at work. It takes a while to get used to it, and I have to admit, I still don’t feel very comfortable doing fine dragging operations (hence my extreme desire for the shift-arrow control-key ranging and copy-and-paste that you get with any non-terminally retarded UI). While KDE had this built into its preferences, other Window Managers don’t. While it’s a command is given in the BSDE FAQ, it’s actually wrong and doesn’t work on Linux. You need to run xmodmap with the command “pointer = 3 2 1 4 5” not “pointer = 3 2 1”. If you do the latter, it’ll barf at you. wee!
  • xterm colors – The defaults for xterm are a blinding black on white. I solved this problem years ago, but for some reason, my .Xdefaults changes weren’t loading. Turns out that if that happens you have to rund xrdb on the .Xdefaults to update (ha ha!, of course!) I also finally got around to changing the xterm*color4, which by default is an illegibly dark blue when running on a dark background to something better (I’ve settled for now on RoyalBlue

As you can see, that’s what I like about Linux. It just works.

On an unrelated techie note, in Firefox with Adblock Plus, you can go into about:config, filter for the adblock preferences, and change extensions.adblockplus.defaultstatusbaraction to 3 so that clicking the statusbar icon will default to toggling Adblock Plus on and off. I was just playing around with that and I realized it was just countint down the menu. Very cool.

Per-User Maildrop Mail Filtering for Postfix w/ Virtual Users

Last year, I set up an “ISP-level” mail system for myself. Well, it seemed to me like I had some pretty common requirements: being able to handle mail for multiple domains without having to create UNIX accounts, and secure logins. Despite “email” being 30 years old, however, it turned out that there was no simple package that I could apt-get to install this. Instead, I ended up almagating various tutorials/packages for my Postifx w/ SASL + Courier IMAP w/ SSL + MySQL configuration.

One piece I wasn’t able to set up, however, was virtual user mail filtering. I banged my head until finding out there was just no way to do procmail delivery w/o an associated UNIX home folder. After which, I moved on to looking at how maildrop might work, but after many fruitless hours I gave up.

Recently, I decided it was time to get it working, no matter how long it took, and after a few hours of hammering away, I finally have gotten it done.

  • If you have a setup like mine setup except for maildrop, this is the clearest tutorial.
  • If you’re running Debian, you’re going to need to compile your own Maildrop w/ MySQL support enabled. Note, that Maildrop 1.7.0 is the last version that has built-in MySQL support; after that version, Maildrop switches to using Courier authlib for connecting. I have the authlib stuff set up for my IMAP/POP, but I couldn’t get Maildrop to connect properly!
  • test your maildrop setup separately as root:
    maildrop -V 10 -d user@domain.tld < test.eml
  • If you get an maildrop: signal 0x0B error, use strace to see what's going on. Maildrop segfaults on bad configs. Also, be sure to turn on MySQL logging to see what query maildrop is actually performing