Response to Gordon Cormack’s Study of Spam Detection – John Zdziarski (DSPAM) follows up the recently /. study. IMO, two of the most significant weaknesses of Cormack’s study:

What’s more, there was no archive window, because Cormack didn’t perform any initial training before taking measurement. Statistical filters know nothing when you train them. Therefore, if you’re going to measure their accuracy, you need to train them first. If you start measuring before you’ve taught the filter anything, then you’re going to end up with some pretty mediocre results.

and:

SpamAssassin is immediately eliminated from the credibility of these results because the test corpus was classified by SpamAssassin (twice) and the test was ultimately a product of SpamAssassin’s decisions.

Zdziarski goes on in some detail on many of the weaknesses in Cormack’s study. Some arguments are stronger than others, but well worth reading if you have an interest in the area (especially when you compare Cormack’s study to others done at the MIT Spam Conferences). Interestingly, Cormack is a full fledged professor and the University of Waterloo. I suppose the question to ask, is if the test results are reflective of real world performance (and if not, then it is a disservice to all the people who glance at the /. headlines and take it at face value).

(Note, these days, I run CRM114 with very good (99%+) results (on a 7 year old email account), along-side Mail.app’s filter (~95%) and Brightmail (~90%), so I’m probably biased about this. But with Cormack having gotten the results he’s getting I’d tend to think that he’s doing something very, very wrong. At some point, when I have time and a better organized site I should write more about my setup)

MSN: Fahrenheit 9/11 Interviews and Commentary (via):

COURIC: And wouldn’t your movie have been better balanced if you had at least included some about Saddam Hussein’s own reputation?

Mr. MOORE: You guys did such a good job of–of telling us how tyrannical and horrible he was. You already did that. What–the question really should be posed to NBC News and all of the other news agencies: Why didn’t you show us that the people that we’re going to bomb in a few days are these people, human beings who are living normal lives, kids flying kites, people just trying to get by in their daily existence. And as the New York Times pointed out last week, out of the 50 air strikes in those initial days, the–we were zero for 50 hitting the target. We killed civilians and we don’t know how many thousands of civilians that we killed. And–and–and nobody covered that. And so for two hours, I’m going to cover it. I’m going to–out of four years of all of this propaganda, I’m going to give you two hours that says here’s the other side of the story.

  • NYTimes: Spotlight on Fahrenheit 9/11
  • Unfairenheit 9/11: The lies of Michael Moore. – mildly interesting mefi discussion:

    shoos: Here’s how the timeline of the bin Laden flight authorization worked, from what I recall of Clarke’s hearing:
    1) Clarke refuses to unilaterally authorize the flight(s).
    2) He asks the FBI to look into it.
    3) Dale Watson at the FBI gives it the okay.
    4) Clarke authorizes it.

    Clarke takes sole responsibility for it, because he’s that kind of guy.
    He was in charge, and the buck stops with him. It may have been a good
    idea, for all I know. I’m not a conspiracy theorist, and it may be that
    the bin Laden flight was perfectly benign. But Hitchens’ article
    utterly ignores that Clarke did not make the decision alone. He spins
    Clarke’s statement of taking full responsibility for what was done
    under his leadership into an implication that nobody else had anything
    to do with it. In an article blasting another person for distorting the
    truth, that’s significant.

  • Screenshots please – jko suggests that if you’re making software available for download to have screenshots; a corollary, if you’re making web software, have a demo available as well
  • Open Soure Portfolio Initiative – speaking of: does it do anything? is it any good? I don’t know, they don’t have a working demo
  • JA-SIG CVS Monitor – take a look at what’s been happening (both CVS Monitor and CIA kick butt)
  • sourcefrog – Martin Pool’s weblog, full of geeky links (via taint)

An interesting phish got missed by one of my spam filters (these days I’m using Mail.app’s filter (LSA-based), BrightMail (hybrid), and CRM114 (SBPH/Markovian), with Procmail doing the sorting):

It was pretty obviously a phish, but Mail.app’s HTML rendering (it loaded the map even w/ images turned off?) and the nested encoded image-map url within proper link is pretty clever:

<lt;A HREF="https://web.da-us.citibank.com/signin/scripts/Iogin2/user_setup.jsp"><lt;map name="FPMap0"><lt;area coords="0, 0, 610, 275" shape="rect" href="http://%31%34%38%2E%32%34%34%2E%39%33%2E%39:%34%39%30%33/%63%69%74/%69%6E%64%65%78%2E%68%74%6D"><lt;/map><lt;img SRC="cid:part1.04000408.00060006@users-billing17@citibank.com" border="0" usemap="#FPMap0"><lt;/A>

Too bad Mail.app doesn’t have a show-only text option. Maybe it’s time to up the mime defanging in procmail.

Useful: