PEARCEC.COM SPAM_CONFERENCE_2003
HOME
RESUME
WORK
BOOKS
ARTICLES
CFENGINEDOC
RSS

Archive
RSS Feed

www.drbond.net dejafoo.net www.rbohn.com luke.ehresman.org www.openthought.org www.brainboy.com www.bromidic.net


Spam Conference Reports

By Christian Pearce

Edited By Bill McClintock

The conference was a great mix of academic and real world talks. The web site spamconference.org will have the webcasts available for the next six months (real audio format) so you can watch and list to the entire conference. I would advise anyone with the time and interest to check it out. There was a lot of good information about spam in these presentations.

There are a bunch of interesting points that need to be evaulated working for a portal company, CommNav, especially one that sells a product, Puppeteer, that could be preceived as mass mailing software. How do you enable and educate the user to understand and be able to use edge solutions for fighting spam, such as the concept of creating and maintaining a corpus. What about building good spam filtering into the portal and Puppeteer?

I wanted to post a report on each speaker, highlighting the points they were trying to make. The abstracts list from the site is a little out of order; I'll arrange my summaries in the order in which the conference took place so it lines up with the webcasts :

Gnus vs. Spam

Teodor Zlatanov, spam.el Maintainer

I walked in around the middle of Teodor giving his talk. spam.el is a plugin for GNU Emacs mail and news reader. While the work is interesting it certainly isn't solving the worlds problems when it comes to spam.

Sparse Binary Polynomial Hash Message Filtering and The CRM114 Discriminator

Bill Yerazunis, MERL

Bill wrote a Langauge call CRM114 for implementing Bayesian statistical mail filtering. He was able to acheive 99% plus accurarcy after about a month of training. He was very informative and exciting to listen too. He analyzed a large number of hashes of words, phrases, and variations on those phrases in all the emails he gathered.

Adaptive Spam Filtering

Jason Rennie, MIT AI Lab

Jason had been researching the problem of filtering spam for a while at the AI Lab. I got the impression he started work earlier on and left it as is in order to finish up his Ph.D. His goals were in the right place. He realized that the Bayesian approach is very useful, but does not adapt to fight spam as quickly as spammers learn to circumvent it. Like most of the technical talks he discussed various combinations of methods and new types of abstract rules that could be applied to filtering.

The Spammers' Compendium

Dr John Graham-Cumming, POPFile

John has a very successful Bayesian filtering program that works at the edge. It is very popular with Outlook folks. It also has a wide following outside the United States since it works well with other languages. The program is called POPFile and acts as a proxy to your mail server. It runs locally via a perl script. Very lightweight. He is currently faced with many of the normal challenges associated with cleverly crafted spam invented to thwart such filters. I would recommend this to anyone who doesn't work on computers for a living. John went out of order since John Draper was not ready.

Following Their Patterns

John Draper, ShopIP

For those of you out there who don't know John Draper, maybe the hacker alias Captain Crunch rings a bell. He is the most notable phone phreak of his generation. He discovered that a whistle that came with Captain Crunch cereal was the same frequency the telephone company used to signal a long distance call. I have to say it was disappointing listening to his talk having somewhat idolized him as a kid. Now I understand why I liked him, he basically thinks and acts like one. I was hoping to hear some good things about his company and Crunchbox. From what his abstract said it uses Snort to detect spam attacks. Even though he mentioned early on that he is going to talk about it, he spent all of his time talking about how to find spammers and how to beat them at their own game. Which is what we all wanted to do like six years ago. Yes we once thought it would be funny to send tons of retalitory mail, but this isn't the solution. Finally, he put in a plug for Kevin Mitnicks new book. I won't justify the plug by giving the title. Sorry John, you should have talked about the Crunchbox, we are all more than familiar with Social Engineering. Who among us hasn't had to lie about pretending to reboot your windows box while running linux, when you are having broadband connectivity issues?

The Case for Spam Research Infrastructures

Paul Judge, CipherTrust

Paul was well spoken and had very specific goals in mind. He wants to set a ground level for research to take place, and then build a community and baseline of tools and data for everyone to collaborate on. He was involved in starting spamarchive.org, which is aimed at collecting identifiable spam. Many people asked the question, what makes your spam my spam? Which is a valid point. Maybe this project can help set the guidelines of what rules people use to consider something spam. This is sort of a naive notion, but how can we be so quick to point out a virus or a security hole, and yet have such difficulty defining spam? Certainly worth keeping an eye on considering people have been submiting spam. He mentioned they broke out the spam submissions to automated and manual. Maybe a rule set or questionaire should be filled out as to why the individual felt it was spam. From there we can start to identify different levels of spam such as the weekly newsletter from amazon you signed up for. We might be able to pull off an open source version of Brightmail.

Better Bayesian Spam Filtering

Paul Graham, Arc Project

This is the guy that brought us together. He wrote the paper a plan for spam. He is the one that suggested and made the masses aware of concepts for Bayesian spam filtering. Based on work that Microsoft and another group of individuals did. Not having read the plan for spam yet, and not knowing exactly what his project entails, I leave you with the following: Bayesian seems to be the best filter technique we have at the moment. Paul, thanks for cranking it up a notch.

eXpurgate: a different approach in filtering E-Mail and detecting SPAM

Robert Rothe, eleven GmbH, Germany

Robert discussed the solution his company in Germany took towards solving the spam problem. One problem with companies like Brightmail that do bulk mail filtering for commerical customers is what to do when they subscribe to bulk email services they wish to receive? While it wasn't clear what their solution was or how it was implemented, they broke the classifications into clean, bulk and dangerous. This gave the end user a choice and provided some level of service.

Spam Filtering at the Network Level

Matt Sergeant, MessageLabs

Matt was faced with implementing a spam filtering solution at MessageLabs. At the time SpamAssassin was the best thing running. As a result he became heavily involved with the effort and a key developer. He emphasized that SpamAssassin is a framework, and not just rules-based filtering. As a result the new concepts of Bayesian filtering is one of the many components added to SpamAssassin for removing spam from a user's inbox.

Anti-Spam Techniques at Python.org

Barry Warsaw, Pythonlabs at Zope Corporation

Barry had a lot of ideas about smarting-up the way we filter spam. Talked about the benefits of multiple implementations to avoid monoculture. He offered several real world techniques and hard ACLs for removing spam via exim4. He also discussed the integration of spam filtering into Mailman, and provided some perspective to some of the spam issues mailing lists deal with.

Spam: Threat or Menace? An ISP's View

Barry Shein, CEO, The World

Barry is an academic turned CEO. His company was the first to offer dial-up service and went as far as to blame himself for the spam problem. He will be one of the first people to say that spammers are like organized criminals. He defined spam as theft of service. Struggles with the very real problem of users complaining and having his hands tied when trying to involve the government. Barry posed the question, "Should we start charging for legiment bulk mail as a service?" This helps to define what is acceptable and puts a price on spam, which allows lawyers to take spammers to court to sue for actual losses.

Smartlook: An E-Mail Classifier Assistant for Outlook

Jean-David Ruvini, e-lab Bouygues SA, France

Jean-David discussed SmartLook. The basic premise is that users actively filtering their mail with folders could be used to build Bayesian like filtering. While he used different alogorithms, it is the same idea. Interesting research considering it applies to more than just spam. It is unfortunate it is a Microsoft only solution. His work should be explored and applied to other mail clients.

Lessons from Bogofilter

Eric Raymond, Open Source Initiative

Eric Raymond confirmed everything I have come to believe about him in one sentence: "Us LISP programmers need to stick together." He went on for about ten minutes before he got to the point he was trying to make. While his point was valid it wasn't all that original. The algorithms for the Bayesian filtering doesn't much matter since they are approximately the same. The challenge is in getting the user to understand how to train the filters to work effectively. What he suggests is having two types of delete buttons. He also talked about bogofilter and how implementing it in C gained Bayesian filtering the credibility it needed.

Spam Filtering: From the Lab to the Real World

Joshua Goodman, Microsoft Research

I don't know it if was planned or an act of good timing, but Joshua Goodman representing Microsoft and following up Eric Raymond was highly enjoyable. Like Barry Shein it is good to have a mix of people involved in the problem. Joshua gave a good overview of the problem from the end users perspective and discussed the research Microsoft has been actively involved in. Not to mention earlier attempts at deploying spam filtering lead to a lawsuit. He discussed the use of providing the enduser with Junk and Not Junk button, which promptly spurred Raymond to demand recognition by Joshua. Joshua also discounted Raymonds earlier statements about algorithms not mattering. He mentioned open source projects being bad for spam filtering considering it tells spammers what to do differently.

Integrating Heuristics with n-grams using Bayes and LMMSE

Michael Salib, MIT

Being a true MIT student Michael learned of the Spam Conference and decided to apply his infinite knowledge of math and engineering to the problem. Since it was acacademic exercise he mentioned it didn't have to be exactly realistic. But it provided some interesting approachs. Essentially he took a model electrical engineers use for reading signals on a noisy line. The basics is the sampled value is X and the value is Y. If you know the value of Y your can make some assumptions, but if you don't you start with some median of X that you have sampled. He then broke this down into a further complex math problem that eventually turned into linear algebra. He found for small sets of data he was better than SpamAssassin, but as the values grew he lost out. He wanted to note his work could be completely wrong and you should not listen to him. Future ideas he had were to create a text kernel for creating higher dimensions for representing email and then drawing a linear line between the representations to find a commonality. His work also entailed using every filtering trick in the book.

Forty Years of Machine Learning for Text Classification

David D. Lewis, Independent Consultant

David was one of the many diverse people who gave a talk. Having a wide variety of experience in classification of data, he offered an outside view to the not so unique problem of filtering and classifying data. If my memory serves me correctly he predicted spam is going to be so smart and sophisticated that spam and ham will converge to the point where no filter will be able to make the distinction.

How Lawsuits Against Spammers Can Aid Spam-Filtering Technology: A Spam Litigator's View From the Front Lines

Jon L. Praed, Esq., Partner, Internet Law Group

Not doubt lawyers are scary people. Especially when they use the words attack and laws in the same sentence. You best hope you are not at the business end of that sentence. Jon is one of the lawyers that give his profession a good name. His is one of the good guys. It never ceases to amaze me how intelligent a lawyer is at understanding a set of problems and applying law to it. He basically gave a defition of spam as Unsoliciate Bulk Commerical Email. He discussed each term and how it is viewed by lawyers defending the spammers. His went to bat for AOL and Verizon and won in every situation. Discussed the possible ramifications for spammers including jail time. Also discussed how spammers can not get out of their punishment by declaring bankruptcy because spamming is a fraudulent act. Jon agreed with Barry Sheins comments about spammers being organized criminals. He discussed the various wins he had with well know spammers and the precedent set in Virgina. Known as the rocket docket, a famous spammer was held accountable for spam in the jurisdication in which his spam landed. He cited the kindergarden rule now applies for spammers. This means you should know better even if you didn't read your ISP's AUP. So there is no excuse. He discussed how every spammer would defend themselves as saying they had opt in, but could not provide records. He believes a law should make spammers provide an owner of the information similiar to the act that makes pornographers provide records of everyone being of legal age. This will help in the battle against non-legitimate spammers. All in all Jon is working very hard and understands the problem. Needless to say he ran over his time by about ten minutes and no one was complaining. Jon, thanks for being one of the good guys, we need more of you.

Desperately Seeking: An Anti-Spam Consortium

David Berlind, CNET

Being a journalist, David offered a great user perspective. Unlike many of us who can set up SpamAssassin and be happy, David's only solution was to take solving the problem to the next level. Offered many antidotes he received from users about spam and other problems he reports on. Like many of us he thought about making SMTP work correctly for accountablility. Apparently CNET wants to do it and wants to involve everyone. Like George Bush's fight against terrorism, either your are with use or your are against us. A bold move from an unlikely source, but someone needs to do it. He suggests vendors can agree on other emerging standards such as Bluetooth and SOAP, so we should be able to agree on a better standard for SMTP which provides an end to end solution. Check the spamconference.org site for more detail since I don't have anything further at this time.

Fighting Spam in Real Time

Ken Schneider, Brightmail

Ken offered a very interesing insight to solve spam via a commerical setting. Brightmail also provides services protecting against email viruses. They were in a unique position of having an existing infrastructure and customer base to leverage for detecting spam attacks in realtime and generating a signature based on it. They then apply their unique and propriety filtering in realtime to every site who subscribes to them. Since they have such a large footprint they are able to sound the proverbial warning alarm for everyone else. They are also faced with the same problems eleven was faced with when it comes to providing a commerical services that weeds out email customers want.

Finishing Thoughts

Spam is never going away. It has turned into a social problem similiar to that of the war on drugs. People who succumb to the spammers email either by lack of education or addiction continue to line the pockets of the street pushers and ultimately the people selling the products who enlist them. Once again it has degraded into solving the human condition. Even if we can setup laws to stop the criminal activity in the US, we are all but powerless againsts the offshore spam houses, just like the drug cartels who rule in countries with a less than adequate infrastructure for controlling such problems. Bottemline: it's spam, get used to it, it is going to be here for quite some time, learn to adapt by training your Bayesian filters. Educate everyone you can to do the same. Support the people who are working to stop it.

~pearcec