I spent an hour or so this morning writing a regular expression to conditionally convert html character entities. Finding character entities is easy: &([A-Za-z]+;|#[0-9]+;), and ‘grep -v’ will give you the opposite. Of course, doing substitutions in Perl, there is no ‘-v’ option and so this becomes a bit more problematic. Here’s my solution:


# doesn't begin w/ alpha or #
# alpha, but no ;
# '#' but no ;
&([^a-z#]|[a-z]+(?=[^a-z;]+)|#d+(?=[^d;]+))/&$1/i

You need to use lookaheads to insure that there are no semicolons at the end of the string you’re looking for.

Of course, some time later a friend sends me an example that basically does the same thing.

Anyway, some good came out of this: I broke out the copy of Mastering Regular Expressions that I hadn’t touched in a while, and I have a few links that are of interest:

I’ve spent the rest of the day so far shooting around town. I need to acquire 7 stills from this video that ‘represent’ LA.