Working on this got me to clear out my inbox.

Quick and dirty current inbox stats:

Inbox: ($unread)/$total messages “; print ‘Size: ‘ . sprintf(‘%.1f’, $size/1024) . ‘ MiB

‘;
?>

Here’s the code that generates the text file which I simply read with file() and then split:

#!/bin/sh
echo time,`date +%s`;
echo inbox.total,`ls /home/lhl/Maildir/cur/ | wc -l`
echo inbox.unread,`ls /home/lhl/Maildir/cur/ | grep -v 'S' | wc -l`
echo inbox.size,`du -s /home/lhl/Maildir/cur/ | cut -f1`

Alternatives for keeping detailed mail stats:

  • Use procmail log tool (mailstat, fm.pl) in cron, possibly graph to RRDTool? (see Mailgraph)
  • Use Mail::Graph or other Perl-based solution
  • Log to MySQL schema: total(timestamp, folder, unread, total, size)

Ideally, I’d want to be able to do minute-by-minute graphing of individual mailboxes to see trends. Would be interesting to generate a Filelight-like display as well as to try to map time-based changes (perhaps in 3D?).

What I’ll probably do I’ve done:

  • Pulling Inbox info (total, unread, size) from cron every minute
  • Dumping Inbox information into MySQL table every 10 minutes
  • Processing daily procmail log from cron every minute
  • Rotating procmail log stats into MySQL table at midnight

Getting totals should be as easy as running a SUM() on the mail_daily table. The mail_folders is currently only watching the inbox, but could easily be expanded. (These tables should really be called mail_incoming and mail_totals to better describe how their different, but whatever).

Welcome to America – this is sickening (but sadly these days, par for the course — man, that’s even worse).

“How dare you treat an American officer with disrespect?” he shouted back, indignantly. “Believe me, we have treated you with much more respect than other people. You should go to places like Iran, you’d see a big difference.” The irony is that it is only “countries like Iran” (for example, Cuba, North Korea, Saudi Arabia, Zimbabwe) that have a visa requirement for journalists. It is unheard of in open societies, and, in spite of now being enforced in the US, is still so obscure that most journalists are not familiar with it. Thirteen foreign journalists were detained and deported from the US last year, 12 of them from LAX.

Three female officers arrived to do a body search. As they slipped on rubber gloves, I blenched: what were they going to do, and could I resist? They were armed, they claimed to have the law on their side. I was an anonymous foreigner who had committed a felony, and “those were the rules”.

Recently Bush has made the statement that “The values of this country are such that torture is not a part of our soul and our being.” But it seems that people are all too willing and comfortable in carrying out grave injustices when encouraged by bureaucracy and institutionalism. And what can anyone do in retaliation?

I think that anyone that does much flying will recognize shades of the reporter’s experience. Flying back from Denver the other day, I was asked by a TSA lady whether I had a video camera. When I replied that I did not, but I had a digital camera she smirked and said that “that’s okay… for now.” This has nothing to do with public safety, but the knowledge of the very real power held kept me from giving her a proper sig heil.

Going to the airport gives one a very real taste of what it’s like to live in a police state. And the airlines wonder why they’re losing money.

Response to Gordon Cormack’s Study of Spam Detection – John Zdziarski (DSPAM) follows up the recently /. study. IMO, two of the most significant weaknesses of Cormack’s study:

What’s more, there was no archive window, because Cormack didn’t perform any initial training before taking measurement. Statistical filters know nothing when you train them. Therefore, if you’re going to measure their accuracy, you need to train them first. If you start measuring before you’ve taught the filter anything, then you’re going to end up with some pretty mediocre results.

and:

SpamAssassin is immediately eliminated from the credibility of these results because the test corpus was classified by SpamAssassin (twice) and the test was ultimately a product of SpamAssassin’s decisions.

Zdziarski goes on in some detail on many of the weaknesses in Cormack’s study. Some arguments are stronger than others, but well worth reading if you have an interest in the area (especially when you compare Cormack’s study to others done at the MIT Spam Conferences). Interestingly, Cormack is a full fledged professor and the University of Waterloo. I suppose the question to ask, is if the test results are reflective of real world performance (and if not, then it is a disservice to all the people who glance at the /. headlines and take it at face value).

(Note, these days, I run CRM114 with very good (99%+) results (on a 7 year old email account), along-side Mail.app’s filter (~95%) and Brightmail (~90%), so I’m probably biased about this. But with Cormack having gotten the results he’s getting I’d tend to think that he’s doing something very, very wrong. At some point, when I have time and a better organized site I should write more about my setup)