Posted on January 25, 2004February 27, 2005 by lhl

I haven’t been closely following the comment spamming problem, but it looks like its hit Trackback now as well. Furthermore, the spammers have discovered flooding and anonymous proxies… It’s become clear to me that these attacks will completely change the nature of the weblog landscape. It was only a matter of time I suppose. Rather than waiting for it to overtake and destroy the medium (a la USENET), it’d probably be good to be proactive.

At this point, it looks like rate-limiting (and auto-blacklisting based on flooding) is currently the most effective stopgap to go. The addition of easy deletion/banning might be a good idea (marking a comment as spam either from a custom interface or from the page itself will remove the spam, blacklist the urls pointed, and blacklist the posting IP). Bayesian-type filtering probably won’t work very well at this point b/c of lack of headers, size of corpus, although a SpamAssassin-like point system might (see also, slashcode noise filters). Using redirects (a la 2.661) may reduce impetus for spamming (although not for those that are just being annoying). White-listing sort of defeats the purpose, although I could see this whole thing being a good push for a Digital ID system (whether actual DigID or adhoc via PGP/GPG signatures). This could work in conjunction w/ a white-list/black-list system.

For the current flooding, which only serves as an attack tool, it may be a matter of thinking up of a way coming up with a number of challenges (two checkbox questions, one will ban you, form field and questions randomly custom generated) that can’t be automated, or assigning session ids to track a client regardless of IP. Of course, trackback would be more difficult. For trackbacks, one could run a mathematical filter on the trackback url before (and periodically after) putting it up… That’d have the bonus of checking for linkrot as well. (see also pingback as alternative)

Other people have been putting way more brainpower into this than I; this is just me blabbing of the top of my head.

Never a Dull Moment – trackback flooding
Comment Spam II
Goodbye Trackback
Spam Update – using nonces
Throttling Down
Confidential to my crapflooder
FloodMT Crapflooding, Trackback-flooding and Whining crapflooders – summary of soap opera thus far
Once upon a time… – moving forward as a community?

(I don’t think I have to worry too much about comment or trackback spam right now, the flooders seem to try to attack anyone who writes about them)

Posted on January 24, 2004February 27, 2005 by lhl

BEST HALLOWEEN COSTUME EVARRR!!1!

Posted on January 24, 2004February 27, 2005 by lhl

Some orkut observations:

I got on the other day, by the count algorithm (uid from 1M), I am user 6617. My last friend request earlier today has a user of about 23K. It will be interesting to see the growth curve of the ‘invite only’ network
Everyone on my current friends list is a blogger (also the biggest community I am in is ‘Bloggers’ which was at about 2 or 3 when I joined; now grown to 235)
The system looks like it’s written in C#/ASP.NET; it uses a lot of xmlhttp for inline page updating (works in IE, Mozilla, fails silently in Safari)
There’s a level of privacy control (restricting information by predefined, but not arbitrary groups)
You can add someone as a friend automatically, which is then pending. If you are rejected, you can’t add them again, they can add you; I’m assuming if you reject them it’ll mean you can never add each other, but I wasn’t really feeling like testing that part out
There’s some ratings; you can be a fan of someone, and rate your friends (aggregated pseudonymously) on trustworthiness, coolness, and sexiness

Posted on January 23, 2004February 27, 2005 by lhl

Aaron Swartz, under the iron
Paul Graham: What You Can’t Say – it’s dated Jan 2004, but wasn’t this published way earlier last year?
Creative Nomad Muvo2 4gb MD DOWNGRADE 😉 Step by step – take apart a $299 Muvo2, which has a 4GB MicroDrive ($499 retail)
Asus WL330 Pocket Access Point includes Ethernet, repeater – whoa, it’s cheap too. Can’t resist (but utilitarian!) geek lust
Asia Times: The masters of the universe – critical article about the Bilderberg club, an ultra-VIP capitalist seekrit society?

Posted on January 23, 2004February 27, 2005 by lhl

Never Mind The Bollocks, Here’s The Wonderchicken – stavrosthewonderchicken ruminates on blogs and punk rock.

Weblogs are a party, damn it, and sometimes they’re publications too, or instead, and sometimes they’re diaries, sometimes they’re pieces of art, sometimes they’re tools for self-promotion, sometimes they’re money-maknig ventures, sometimes they’re monuments to ego, sometimes they’re massive wanks, sometimes they’re public services, sometimes they’re dedications of faith, sometimes they’re communities.

Posted on January 21, 2004February 27, 2005 by lhl

What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model? – more great stuff at InfoDesign
A longitudinal study of Web pages continued: a consideration of document persistence – in English: studying linkrot
meantime: non-consensual http user tracking using caches – clever, clever
Mapping built into latest Nokia mobile
Aerogel Experimentation – Art Center indeustrial design student doing cool stuff at USC/JPL
Tufte: Evidence and assumptions in cladograms (and other tree diagrams) (hey, running on ACS)
Four’s a crowd – fascinating article on string quartet interpersonal dynamics
Interactive Visualization of Large Graphs and Networks, via danah
Spam sucks, however, development of advanced high-accuracy filtering has long-range implications for data organization. Notes (from MIT Spam Conf, blogged last year)
Dynamical Systems and Ergodic Theory
Cybertext: Perspectives on Ergodic Literature

Posted on January 21, 2004February 27, 2005 by lhl

Just got an email from Tim Conner, developer of BlogApp and BlogScript (also, he has some neat AppleScript snippets as well) with a couple of string functions he uses:

(**** Example ****)
-- this example will find the word "work" in the string 
-- "Bob went to work." and replace it with "the beach".
set myResult to snr("Bob went to work.", "work", "the beach")
display dialog myResult
--
(**** fast search and replace methods ****)
on snr(the_string, search_string, replace_string)
  return my list_to_string((my string_to_list(the_string, search_string)), replace_string)
end snr
on list_to_string(the_list, the_delim)
  my atid(the_delim)
  set the_string to (every text item of the_list) as string
  my atid("")
  return the_string
end list_to_string
on string_to_list(the_string, the_delim)
  my atid(the_delim)
  set the_list to (every text item of the_string) as list
  my atid("")
  return the_list
end string_to_list
on atid(the_delim)
  set AppleScript's text item delimiters to the_delim
end atid

Should come in handy next time I take the fork to the eye.

Posted on January 21, 2004February 27, 2005 by lhl

CBS Cuts MoveOn, Allows White House Ads During Super Bowl
Seth Landsman wrote an Mail.app Applescript for training SpamAssassin: Queue Spam Applescript
Justin Frankel On AOL, Subverting The Status Quo – read this article earlier, but some of the comments are interesting
When Word-to-XML Conversion Get Nasty
ETCON Participant Sessions
Jonathan Weed does some goes off on bottled water
More web security:
- Simon: Defending web applications against dictionary attacks
- Kalsey: How Much Security is Needed?
The History of Electronic Mail, by co-writer of CTSS MAIL program (1965)
Reflections on the 25th Anniversary of Spam

Posted on January 21, 2004February 27, 2005 by lhl

Thought: would remove human error from spam/ham classification if you sent for training based on message location (if misclassified message is in inbox, always classify as spam, if it’s in the error mailbox, always classify as ham); working on that tonight.

OK, done. Here’s a version of my script that will automatically submit as misclassified spam anything in the inbox, and misclassfied ham if its anywhere else (easier than coding the specific folder, should be just as effective). Sure it has the possibility of being slightly usafe, but less than human error for me at least. (Alternate method would be to tokenize message and find out if it’s classified as spam or not and reverse, actually not that hard since I already tokenize to strip the X-CRM-Status header)… Well, it’s late and I’m lazy. Good enough.

CRM114 Smart Training___shift-s.scpt

Posted on January 20, 2004February 27, 2005 by lhl

I finally got around to setting up postfix, courier, procmail, getmail, and crm114 all up on my server. It was surprisingly painful considering I already had postfix and courier working. I’ve not figured out why mutt is being a pain…

In any case, I’m now training CRM114. Having done only a few dozen error corrections, it’s already starting to get pretty good. Hopefully I’ll have enough volume in the next couple of weeks to really get it well trained, and then never have to worry about it again.

Training involves forwarding erroneous mail back to yourself prepended with a ‘spam’ or ‘nonspam’ command and your password. Since Apple’s Mail.app doesn’t do full-source forwarding by default, I wrote to AppleScripts to automate the process (one of Apple’s included scripts gets you half-way there). They send out an email automatically as spam/nonspam (stripping the bad X-CRM114-Status line as well) and either delete or move to the inbox as appropriate. (I also have procmail set up to move the training results into its own folder)

Rename the last part to whatever you want your key command to be, and put it in your ~/Library/Scripts/Mail Scripts/ folder.

Lastly, I want to reiterate how much AppleScript documentation sucks total ass. The Language Guide is useless since it doesn’t have any references to basic operations (so, if Google doesn’t turn anything useful up on applescript string parsing, you’re up a creek – this stuff isn’t in the application dictionaries either…). Basically, the only way to get anything done is to dig around until you find an AppleScript that does something similar.

Apple: Mail Scripts
Submit mail to Spamcop with Mail.app and Applescript
Text Item Delimiters are De-limitless – is this really the only way to do line-parsing?
More on Conditionals
AppleScript: Essential Sub-Routines
Faster Text Manipulation
Mail.app Mutt

random($foo)

Category: Legacy