External Storage

I’m building a new array, RAID-1 this time.

Cost Description
$137 Firmtek Seritek/1VE4 Controller
$95 4 x 1m eSATA cables
$210 4 drive LCD tray SATA enclosure
$225 Seagate 7200.8 400GB
$217 WD Caviar SE16 400GB
$255 Seagate 7200.8 400GB
$255 WD Caviar RE2 400GB

I’m buying different brands from different vendors as I plan on running on SW RAID this time and want to minimize simultaneous failure (what I should have done to begin with I guess). This will be a dedicated backup system, and I’ll need to buy/find a good archival tool…

Server Swap

Over the next day or so (and already for some), DNS should propagate to point visitors to a new server. The transition was relatively smooth, moving from one Debian machine to another.

  • /etc/passwd and /etc/group files were copied over w/ only minor modifications
  • mysqldump worked flawlessly
  • most of the rest was handled by rsyncing /tmp, /var, /root, and /home directly)

If things go well, everything should be pretty transparent. (The new machine is a faster P4 w/ a couple hundred extra gigs of storage)

Update: hey, I didn’t even notice, but this is the 4000th entry in here. (This count combines entries imported from as far back as 1999) Crazy. Looking forward to spending some time to dust things off a bit now that the J-O-B will no longer be keeping me from personal projects.

Fascism in Utah

It looks like one video has shown up on what happened in Utah this weekend. From an account of a performer at the event:

No one resisted. That’s for sure. They had police dogs raiding the crowd of people and I saw a dog signal out a guy who obviously had some drugs on him. The soldiers attacked the guy (4 of them on 1), and kicked him a few times in the ribs and had their knees in his back and sides. As they were cuffing him, there was about 1000 kids trying to leave in the backdrop, peacefully. Next thing I know, A can of fucking TEAR GAS is launched into the crowd. People are running and screaming at this point. Girls are crying, guys are cussing… bad scene.

Now, this is all I saw with my own eyes, but I heard plenty of other accounts of the night. Now this isnt gossip I heard from some candy raver, these are instances cited straight out of the promoters mouth..

  • One of the promoters friends (a very small female) was attacked by one of the police dogs. As she struggled to get away from it, the police tackled her. 3 grown men proceeded to KICK HER IN THE STOMACH.
  • The police confiscated 3 video tapes in total. People were trying to document what was happening out there. The police saw one guy filming and ran after him, tackled him and his camera fell, and luckily.. his friend grabbed it and ran and got away. priceless footage. That’s not all though. Out of 1,500 people, there’s sure to be more footage.
  • The police were rounding up the staff of the party and the main promoter went up to them with the permit for the show and said “here, I have the permit.” The police then said, “no you don’t” and ripped the permit out of his hand. Then, they put an assault rifle to his forehead and said “get the fuck out of here right now.”

While at the same time, I’m amazed and gladdened by the coverage that’s been organized, in Wikinews, forums, etc., I’m absolutely horrified by the actual accounts. What would you do in this situation if you had a video camera on you? What’s the right thing?

video of police brutality and attempted confiscation of public record

My Programs

For a while, I’ve been hoping for a site that could track and collaboratively filter the applications I run. It looks like MyProgs does just that. While I’m too lazy to move my list of programs from my last install a few years ago, I’m going to try to enter in the programs I’m installing on my new editing station.

Things that could be improved:

  • (While MyProgs currently provides popularity and tag slicing, I think ratings and categorization by utility would be really neat to provide lists of ‘best of breed’ applications per platform — this could all be done through namespacing tags [tofollow up, implementation example, facets and freetags])
  • AJAX tag adding would be sweet
  • The Google Ad at the bottom that is formatted just like your apps and w/o white space? Annoying.

Screen Tip

Screen is one of the most useful apps ever if you’re a terminal junkie on multiple machines (screen -x is your friend). Here’s a little tip so that you don’t end up with dozens of numbered screen sessions. When creating screens, name them for their purposes, like, screen -S bt. Then you can easily screen -x bt.

If you’re lazy like me, you can further shorten that with aliases (I use ss and sx).

Python Tidbits

I spent the majority of my waking hours the past couple of days writing a Python script (had a chunk of XML-RPC code already written in Python) that processes zip files from Blackboard and puts them onto Confluence. Some notes:

  • Python’s time module wasn’t installed by Debian until apt-get’ing mxDateTime
  • libclamav’s file scan will catch all kinds of stuff that the buffer scan will miss
  • Python’s string slicing is sweet (str[::-1] for reversing)
  • sorted() isn’t in Python 2.3, so sorting a hash has to be done against the keys as a list
  • mechanize for Python really isn’t there yet
  • It’s too easy to take CPAN for granted.

Structured Blogging and Data Representation

Most conversation about structured blogging has dealt with the idea at the application and delivery level (microformats, etc). I’ve been interested in the relationship of blogging and other loose KM applications (specifically wikis and outliners) for a while now. I have a belief that ultimately, these applications are more alike then they are different, and can/should be intrinsically tied together (that’d make it a blikiliner, right?), however I’ve been hung on on the best way to store, represent, and relate the common data. With the imminent promise of time to pursue these issues, I’ve started to pick back up my earlier work, and with a fresh pair of eyes.

  • Caching – my original (and still current) approach towards representing pieces of data (microcontent) has been using directional graphs (with typed nodes and relationships, very much like the RDF model). One of the roadblocks I hit was that unlike with trees, there are no high performance ways to store this in SQL. Last year it occurred to me that I should completely forget that, store them as simply as possible (nodes and relationships), and simply build the partial sets (views) and cache those with an appropriate lookup table. Or I could bite the bullet and see if using an RDF database makes sense)
  • First class data structures – this actually is something I’m still thinking about. In a graph model, maybe it doesn’t really matter as long as you can extract types and relationships. There are some things that you’d definitely want to extract (like external and internal hyperlinks), but that can be pretty trivially done post-hoc… Once you’ve committed to that sort of data representation, all you have to worry about then is how to usably combine that information. Still up for question is how fine grained nodes should be, and how to best point within nodes (think about annotations and purple number style addressing). One could conceivably split elements to DOM constituents pretty easily, but that ups the number of nodes you need to keep track of up a magnitude or two (but might be a better alternative to an xpath type approach).

I’m still doing searches to see if there’s been anything new published over the past year or two. Some links:

Web Framework Notes

I’m working on rearchitecture/refactoring of a couple projects right now, one which requires support for custom modules and provides for quick and easy modification by non-experts, and another which is a bit straightforward, with an emphasis on scaling.

In doing these redesigns, looking at existing frameworks and application designs obviously helps. I’m not vehemently against frameworks, but I have found most of them to myself generally reluctanct in their use because even the best designed impose conceptual and organizational constraints which oftentimes mitigate any potential productivity gains. So yes, I’m a ‘library’ guy in that sense. However, that doesn’t mean I’m against frameworks, just that I haven’t found one that is for me naturally maps into my conceptual model of web development.

As with most people, this model has been primarily dominated by page-driven REST-type interaction, but for the past few years, I’ve been trying to come to grips for a holistic approach to handle AJAX (née remote-scripting) and more recently SOAs and APIs. A couple rough notes, probably to be refined:

  • I’m a big fan of the most straightforward controller mapping possible. One of the great things about how scripted application layers like PHP work is that you can start with an URL and figure out how it works from there. Of course, things can quickly get complicated. [this is bad] Minor modifications quickly become a PITA
  • Separating the view is also a good thing, with the caveat that templating languages generally suck. Recently, a coworker of mine has been working on a simple XML-based view framelet called Phiz, which on a conceptual level is really appealing (implementation-wise, what it requires is a good caching system)
  • OOP concepts are a must – inheritance and polymorphism being my top properties. As far as patterns, especially for modules, the Service Locator and Decorator are on my mind right now. Like most people, I’ve settled primarily on an overall MVC model, although I’ve been thinking a lot about IoC and how event streams might be processed.
  • I’ll have to compare how other frameworks implement AJAX and API responses, but the design I’m working on should handle that interaction with pretty much no duplication or messiness.

uPortal and Feed Cleaning

The nature of syndicated data on the web is such that quality and correctness is oftentimes (nearly invariably) uneven. The RSS specs are themselves rather murky, and even the best of sites will push out the occasional unescaped entity or improper encoding.

As seems to be its natural inclination, uPortal completely ignores this reality and completely barfs when encountering any hint of irregularity. uPortal parses RSS via its XSLT channel utilizing Xalan-J, where “error recovery” means throwing an exception, dying, and spewing an ugly error at the user.

By and large most commonly run into error is character encoding issues. The uPortal channel, expecting XML, defaults to UTF-8 when encoding is left unspecified. If there are multi-byte characters, you’re screwed. My solution, that so far has fixed all the feeds that we’re currently ingesting is a two parter, using a Python first stage, and a PHP second stage. Although in most cases, you’d want to combine it into one (the Python code, probably), we’re running the two-parter because the latter code came first and zis used for other purposes.

(If you’re using uPortal: performance isn’t an issue because the channel gets cached by default for 20m. Be sure though to check that your version of the XSLT channel has my caching patch applied. There was a 3 year old caching bug that caused the channel not to cache for guest layouts and inefficiently for logged in users).

  • The centerpiece of the Python code is to use Mark Pilgrim’s Universal Feed Parser. This, of course, solves all the issues related to parsing different flavors of RSS
  • With version 3.x of the UFP, character encoding is dealt with better, and strings are automatically converted to unicode when possible. From there, output is a simple unicodestring.encode(‘utf-8’) away. PHP deals with unicode rather atrociously by comparison.
  • Note, and this really screwed me for a while, that mxTidy, which the UFP defaults to using if it finds it, does not play nice with unicode and will screw you. So be sure to turn it off. (I haven’t tried µTidylib or TidyHTMLTreeBuilder yet)
  • If tidy worked, it could have taken care of converting your entity characters into numerics, but since I haven’t, I instead made entity declarations in the DOCTYPE to cover my bases. You’ll want to load at least the first two, and to be safe, all three of the normative XHTML entitie sets
  • After that, I do some HTML to XHTML processing (unnecessary if tidy would work like it should), and also conversion of non-entity ampersands. This is a good one. Here’s the PCRE:
    /&(?!#?[xX]?(?:[0-9a-fA-F]+|\w{1,8});)/
  • Special cases: If you’re loading images via HTTP and your page is on HTTPS, errors may be displayed, so you might want to omit or convert as appropriate
  • Note: One rule you shouldn’t need if you run through the UFP is conversion of smart quotes, but if you’re doing other processing that hasn’t gone through the first steps, that would be a good idea

And that’s that. Ta-da! The Aristocrats!