Recovery of a ReiserFS Drive with Bad Blocks

There doesn’t seem to be a good detailed HOW-TO, guide, or tutorial compiled on the subject of ReiserFS recovery from partial drive failure, so I figured I’d give it an ol’ writeup, seeing as I have some time to kill (about 10M blocks left to be corrected).

Before I start out, I should mention that good backup procedures are a really good idea. If you’re actually using this advice, you probably realize this post facto. All hard drives fail. No two ways about it, so if you care about your data, have good backup procedures. I heartily recommend rdiff-backup, which not only does rsync-based transfers, but also, as the name implies, keeps increments. For Mac, I recommend psync to a sparse-image DMG for preserving resource forks. Carbon Copy Cloner does some pretty useful automation (in addition to ghosting great). Backupninja looks like a good rdiff-backup script, but I haven’t used it personally. Now, if only I could find a good transparent compressed volume format…

All that being said, sometimes you can get sloppy or lazy. In my case, I was building a new RAID system when a drive on my old file server crapped out… (no need for condolensces, this happened a long time ago, I just got around to doing recovery now)

The first thing you want to do if you notice block errors is to copy everything off. You might want to check smartctl and hddtemp to see if there’s anything really horrible going on before you do that. If your data is really important, shut down and send it to the professionals immediately. Ontrack is the most high profile, but there are probably others that can help you. Otherwise, try to copy what you can elsewhere.

If your copies were sucessful, at this point you can probably chuck your drive, recovery won’t be worth your time. If you can’t mount, or your copy wasn’t successful, then you’ll want two things: a recovery drive with enough space to hold an image of your drive, your drive unmounted. In general I’m of the opinion that it’s better not to spindown — it might not come back up (if you have a drive that won’t spin back up, usually accompanied by a click-whir, and sending it for data recovery is out of the option, I recommend giving freezing a try, it’s worked for me in the past), but it’s sort of hard to say what’s more of a risk. If it appears to just be bad sectors, I’d say turn it off until you’re ready to proceed if you don’t have a recovery drive.

If the drive affected is your primary partition, you may need to reboot with a boot CD. RIP is good (the advantage is it has dd_rhelp preinstalled), but if you have more esoteric hardware (say a separate HighPoint controller) you may want to go directly to Knoppix.

Once you have your old drive [hdbad] and new drive [hdgood], you’ll want to run dd_rescue, or probably better, the dd_rescue helper script dd_rhelp:

dd_rescue -A -v /dev/[hdbad] /dev/hd[good]

This will replicate the current drive onto the new drive (you can dd_rescue to a disk image file instead and mount -o as loopback if you’d rather). From then this point on we’ll try our recovery on this duplicate copy. Note that this differs from what others have said. They recommend backing up (w/ dd) and then recovering on the bad drive. Please read what they have to say, but IMO that’s a bad idea:

  1. A plain dd won’t handle bad blocks correctly, so the backup probably will be messed up
  2. Working on the bad drive will probably make things degrade more, very quickly. Running badblocks -b 4096 actually led to creating more bad blocks (including my superblock, sigh)
  3. Even after feeding the bad blocks file into reiserfsck, it’ll still barf on bad hardware (again, speaking from firsthand experience)

So, I say, work on your dd_rescue’d image directly. This “backup” won’t do you any good if you can’t fix it anyway. Once you’ve moved your image onto a good drive, you can run a check

reiserfsck --check /dev/[hdgood]

It’ll tell you whether you’ve escaped unscathed or whether you’ll need to proceed through the levels of being ‘fscked’: --fix-fixable, --rebuild-tree, --rebuild-sb. reiserfsck is adequately descriptive in informing you on the level of your woes.

Note, I had more success w/ reiserfsck recovery running knoppix26 than the regular 2.4 kernel. If you run into an impasse w/ one you might want to try the other.

It’ll be a couple more hours before I know how this turns out.

Resources:

Year End Cleaning

Ahh, cleared out 4000 messages. Just thought I’d enjoy it while it lasted.

Inbox Stats
---
Inbox: (0)/24 messages
Size:  4.8 MiB

Today's Incoming Mail
---
Today's Inbox:   
Today's Spam:    43
Today's Total:   83
Percent spam:    51.8%

Totals
---
Avg Mail/day:    179
Avg Spam/day:    65
Percent spam:    36.4%

Missed Spam:     58
False Positives: 53
CRM114 accuracy: 99.643%

Sad Panda

Wow, this Metafilter thread is just plain depressing [the thread discussion was mostly about attacking each other, although the original post link was on a depression study: think of 3 snarky comments, then read on]. Granted I haven’t been reading mefi regularly recently, but it seems the sense of community and (well, sometimes) discourse has been at this point completely displaced by asshattery. Just one of the complete WTF moments:

db: Sorry, I posted before reading your apology. But on the other
hand, this is mefi: if you don’t want to run with the big dogs you can
always stay on the porch.

posted by Turtles all the way down at 7:12 PM
PST

Who is this guy I’ve never seen before talking about big dogs? Oh, joined this year. Actually, pretty much everyone in this mess joined this year. Honorary distinction for 327.ca, who should probably know better.

“Blaming it on the newbs” is a broad stroke, I agree. Metafilter will survive, and reach some sort of equilibrium as new members are enculturated, like it has done every time membership has been opened, but I think that it may be reaching a point where some additional features might help with scaling issues:

  • Killfile (w/ public stats, details) see also below
  • Extension of contacts system to allow flagging/highlighting/deprecating of threads and comments
  • Overlay w/ user information/stats in each thread
  • Some sort of per-post (reflected to user) rating. This should be kept a rubric on contribution to conversation (could be multi-axis if there’s a good interface)
  • Community censure

I’ve listed those suggestions from what I think is easiest to hardest. Actually, the last one might not be the hardest to implement from a technical perspective, but I think has the most implications and really needs to be thought about in a larger context of what Metafilter’s goals are.

[Just some disclosure: I started reading Mefi in 1999, but stopped for a long while, then got an account when I picked up Cold Fusion. (worst. web language. ever) I went to a mefi meetup once but it was weird.]

Update: Matt links to this on MeTa (I think stinkycheese hits on a pretty good point). Also, to elaborate with some context on my technological suggestions (for what is essentially a social problem), the point isn’t that people can’t or shouldn’t disagree, but to help encourage an environment of productive discourse. I think that civility and empathy should be at least two minimal requirements. That being said, I don’t speak for anyone else in the community, who collectively will ultimately decide its character.

Some Favorite Albums This Year

[not quite finalized, but close]

It’s about that time of year, so I put together a my top-10 albums released in 2004. Listed from discovery in, how else, reverse chronological order:

  • Bettie Serveert – Attagirl
  • The Arcade Fire – Funeral
  • Erlend Øye – DJ Kicks
  • The Kleptones – A Night At The Hip-Hopera
  • Julie Doiron – Goodnight Nobody
  • The Thermals – Fuckin A
  • Ted Leo and the Pharmacists – Shake the Sheets
  • Eternal Sunshine of the Spotless Mind Soundtrack
  • Mirah – C’mon Miracle
  • Iron & Wine – Our Endless Numbered Days

It was definitely a toughie to cut down to 10. Here’s another 10 that didn’t make the final cut. These were all great albums and managed to keep my attention this year (ranked by closeness to being in top 10):

  • Ratatat – Ratatat
  • Pinback – Summer In Abaddon
  • Delgados – Universal Audio
  • The Good Life – Album of the Year
  • The Album Leaf – In a Safe Place
  • Flotation Toy Warning – Bluffers Guide To the Flight Deck
  • Blue-eyed Son – West of Lincoln
  • Mellowdrone – Go Get’Em Tiger
  • Grand National – Kicking The National Habit
  • Sondre Lerche – Two Way Monologue

Here are some albums that I missed in 2003 that I caught this year:

  • Guster – Keep It Together
  • Some By Sea – Get Off The Ground If You’re Scared
  • Ted Leo and the Pharmacists – Hearts of Oak
  • The Stills – Logic Will Break Your Heart

For 2005, so far I’ve been listening to the new Low and M83 albums.

2004 marked the year that I took my music exploration to the next level. Lots of random clicking in a.b.s.m.indie. It’d probalby make more sense to make a top ‘new to me’ list. Maybe next year I’ll have a good automated solution to track what I’m listening to when (sorry Audioscrobbler, you’re just not cutting it when it comes to returning useful metrics/data).

We’ll Be Right Back…

This morning a can of Red Bull punctured somehow in my bag … while my laptop and camera were in it. As you can imagine, this was a Bad Thing™ and I would not recommend it. Amazingly my Powerbook ran even while wet (before I realized what had happened). The screen is busted, but I am carbon cloning now and will be getting it serviced tomorrow. The RAPS did a decent job keeping my camera dryer, it seems to have come out of it a bit sticky, but otherwise unscathed.

This, along w/ the work, work work, and collab projects I’m swamped with will probably set back personal writing and coding projects a bit.

[note to self: in future, put liquids on other side of waterproof lining]

Facets and Freetags

I’ve been playing around with categorization for a while, and have been watching the recent rise of freetagging with great interest. A friend, Nick Mote, wrote a recent paper that does a good job both to summarize developments from a close-to information sciences perspective and to outline several near-term issues.

Among them, I believe that both disambiguation and synonym merging are relative non-issues. For the former, the ease of intersections almost makes it moot from the practical perspective of searching. For the latter issue, we are already beginning to see automated solutions (related tags).

One of the reasons for the relative ease of solving these problems is that the applicable relevance algorithms are already quite familiar to lay web practitioners (i.e., people like me, without a CS Ph.D) from their long-time use in e-commerce (collaborative filtering), spam filters (mathematical filters), and now social networks (web of trust). [my faves: clustering, adaptive resonance, context graphs, CRM114]

Anyway, what I wanted to ruminate on was the non-hierarchical freetag model of unions, intersections, and differences and see if there’s a way to to build a practical (both in terms of backend implementation and user interface) bridge with (more) traditional hierarchical faceted classification.

The first hurdle is, I suppose coming up with a convincing argument that hierarchies are worthwhile. I think quite obvious that in everyday life, we categorize and subcategorize often and that being a first-class object isn’t completely out of the realm of sense. The real question is if there’s a way of reintroducing hierarchy that doesn’t reintroduce the problems they caused in the first place.

First lets talk a bit about data structures. Traditional structures will explicitly delineate parent/child relationships, either via pointers or relational structures. Note that this can be generalized into the generic subject/predicate/object triplets that we see in RDF tuples. While I’m very partial to typed links (and late binding and dynamic properties… keep on target), I think we can see that this will lead to a level of complexity that will work against both ease of use (first rule of getting user participation) and social/corpus relevance matching (although spam filtering engines like CRM114 are built for sparse data).

Before we get to something I’m throwing out, I’d also like to mention that in our faceted hierarchies, Celko’s set/adjacency models (he recently published a whole book on trees in sql) won’t directly work as we’re dealing with what will likely be very bushy graphs (think overlapping possibly-cyclic digraphs). A real mess huh?

So, my 3AM brainfart last night was to try attacking from the point of view of using traditional tagging structures, and taking the idea of separators for hierarchies and improving on that. For example:

tag: foo
tag: foo.bar
tag: foo.bar.baz
tag: foo.qux

If we do a search for foo[\.]*, we will everything within tree ‘foo’ inclusive. This relieves us of many of the disadvantages of traditional hierarchical representation, and does not marginally increase complexity of either searches or of tag-renaming (the former can be globbed relatively inexpensively and the latter is costly either way).

Now, the main crux of the matter comes with the user interface end. ‘foo.bar.baz’ is a pain in the ass to type. Sure, your non-hierarchical option is to type ‘foo’ and ‘bar’ and ‘baz’, but this, at least from the input side, removes one of the advantages of hierarchical input.

In this case, then, why not do masking? When storing/searching, take both the entire ‘foo.bar.baz’ as well as the most specific child identifier ‘baz’. This creates a new disambiguation issue:

tag: baz
tag: baz (foo.bar.baz)
tag; baz (foo.qux.quux.baz)

From a search/aggregation perspective this might not be necessarily bad as it’d combat the sparseness issue, but from an entry perspective it again minimizes the hierarchy usefulness quotient (at this point, one begins to ask, are intersections that bad? The answer of course is in most cases no, however in some yes).

A UI solution for this is to have an auto-completing combobox that recognizes hierarchies. This widget is also useful for traditional freetagging as well, so is a worthwhile avenue to persue regardless.

[Err, revisions, less half-bakedness forthcoming, as well as some code. Well, might as put this out for commenting. Finishing my trackback/comment code might be good.. ah, screw it, will procrastinate later. Back to real work for now]

On Jenkins and Jennings

I’ve been vaguely following Jason’s Sony travails, but Scott Andrew’s post, Harrassing fans for being fans really hits the nail on the head.

For those who haven’t read Henry Jenkins’ excellent 2002 essay, Interactive Audiences? The ‘Collective Intelligence” of Media Fans, I would definitely recommend it, as it illustrates the transformation of fandom and the consumption cycle.

For all their talk of going viral or guerilla, fundamentally, it seems broadcast media has yet to understand or even care about collaborating and taking advantage of the genuine excitement and support of their best fans.

What a waste of resources huh? Here’s a good whack of the clue-by-four for marketers: you’re burning your brand value by the minute here. And for the bean counters, a good rule of thumb: IT’S BAD BUSINESS PRACTICE TO SUE YOUR BEST CUSTOMERS.

Here’s some work I did last year on New Media and the Consumption Cycle. Also, as part of my academia wrap-up, I’ll have a paper written in the next few months comparing how the game industry and old Hollywood differ on the use of the inverted pyramid, and comparing the relative successes.

New Site, New Focus

In celebrating a new round of way too many projects, I’m finally pushing to kick out something to help me organize all that’s going on.

Currently, I’m aggregating all sorts of stuff that’s already lying about, and actively replacing and augmenting pieces that aren’t up to snuff.

I’ll be renewing my focus on KM, relationship management and collaboration, and trying to work also on productivity, aggregation and integration (especially now that I’m also working on a variety of civic-oriented projects). Lets get this life-site on the road.

(full writeup coming soon, code going into random($code))

4 more years America – from the LJ of a soldier

If you voted for Bush, didn’t vote, or voted no on gay marriage, I hope you get drafted. I hope they stick you in my unit, and you go with me to Iraq when my unit goes back in September. I will laugh when you see what soldiers in that country face on a daily basis. I hope you work with gay soldiers too. I did. One of them saved my life. Think he shouldn’t have the right to get married? Fuck you. He fought just as hard as I did and on most days, did his job better than me. Don’t tell me gays don’t have the same rights you do. Think the war in Iraq is a good thing? I’ll donate my M-16 to you and you can go in my place.

Voting machine error gives Bush 3,893 extra votes in Ohio – AP Breaking News

Franklin County’s unofficial results had Bush receiving 4,258 votes to Democrat John Kerry’s 260 votes in a precinct in Gahanna. Records show only 638 voters cast ballots in that precinct. Bush’s total should have been recorded as 365.

OK, lets see intra-state margin of error correlation of exit polls in paper/evoting districts. Not that anything’s going to chante… No one is going to contest because they’ll just be smeared by the press as sore losers. Rove just had to get away with it long enough for Kerry to conceed, Bush to proclaim a mandate, and the media to kiss their asses. Obviously Palast is following up and no one will care. Again.