Imagine getting invited to a party and looking forward to meeting a friend you haven’t seen in years. You have so much to catch up on after all that time…it’s going to be great! You plan your discussion subjects thoroughly in your head during the drive there. You pull into the parking lot and walk to the front entrance. Your friend appears across the room, and a smile spreads across your face. You reach out to shake his hand…and nine total strangers suddenly interrupt:
“Hey! How’s your mortgage? Want to refinance?”
“You look like you’re having problems – I have special herbal pills to fix that!”
“I’ve been trying to find you! A prince in Nigeria left you a ton a money!”
“Our store has SALES SALES SALES!”
“Do you want to pursue an online degree?”
All these people are preventing you from speaking with the person you want to reach. This is EXACTLY how email would work if it weren’t for a single technology: Spam Filtering.
A little trivia first: the term “spam” originated from a Monty Python skit involving a restaurant that only served combinations of the famous gelatinous canned meat. The customers “don’t like spam!” but keep getting it forced upon them.
Email as a whole would be a useless, garbled mess without spam filtering technology. According to the Message Anti-Abuse Working Group, spam messages accounted for 88–92% of all email messages in 2010. That means for every legitimate message received in your inbox, nine were junk mail. Fortunately we’ve come up with some pretty amazing technology to filter out these unwanted messages. The task is more difficult than one would think.
Where Does All This Stuff Come From?
When the average email user receives some spam, they tend to get a picture of some guy sitting in a basement with 1000’s of email addresses in an Outlook message sending to random people. Truthfully, spam usually comes from a more local source — your family, your neighbors, your coworkers, and anyone else around you.
Botnets, or networks of virus infected computers, account for nearly 4/5ths of all spam sent on the Internet. Once a computer is infected, it joins other infected computers to blast out messages to unwary users. Addresses are often pulled from the computer’s email contacts list, but can also be gathered from websites, chatrooms, and forums. These lists are sold to spammers, who in turn flood the Internet with messages from the infected computers — hoping someone will fall for whatever scam they’re trying to peddle. It’s the equivalent of someone sending your neighbors a flyer in the mail and expecting you to pay for postage.
Fortunately we have a lot of ways to prevent this from happening. Decent and up to date antivirus protection is essential. You don’t want your computer becoming a spam zombie. The other side depends on your email provider.
How Spam Filtering Works
There are several different methods of spam filtering ranging from desktop software to mail server side artificial intelligence. As a rule, spam filtering is much more effective if handled by your email provider. Most mail providers have at least some type of server side filtering options. Here’s a few of the most common:
Blacklisting: Several organizations collect IP addresses of mail servers reported for sending out spam. Collections of lists can used by mail providers to block mail from these sources.
Content Filtering: Advanced filters can look at the wording of mail messages and determine if it is spam. For instance, a message with the subject “Cheap Pharmaceuticals – Buy Now!” would most likely be filtered because the key words “cheap, pharmaceuticals, buy, and !” are used, and these frequently show up in spam messages. This is why most spam has strange misspellings — they’re trying to get around content filtering.
Scoring: An email provider will often use several different spam checks. Scoring keeps track of the number of checks a messages fails, and if it fails too many, the message is filtered. For instance, if a message fails 2 out of 10 tests, it may be allowed to pass, but if it fails 5 out of 10, it may be filtered. Providers have different formulas to determine filtering rules in a scoring system, but it tends to be a very effective method.
On top of these basic methods, advanced spam filtering services are available for businesses that want more control. Services like Postini give users of Google Apps accounts a higher level of filtering (as well as archiving and encryption services), and TOAST.net has just launched a new spam filter product that lets those running their own mail server access to advanced filtering options (as I’m learning about it’s features I’m really impressed! I’ll have more on this later.).
Is It Really Spam?
It should be noted that there is a big difference between solicited and unsolicited email. If you find you’re getting a lot of undesired email in your inbox, take time to see what is actually coming in. Things like store sales, coupons, contests, newsletters and such are NOT spam. Do not filter these — it is better to unsubscribe to them (there’s usually an unsubscibe link at the bottom of the email). These sources probably received your email address through an online purchase, signing up for a contest, a store register, or anywhere anyone asks your for your email account.
Unsolicited email includes messages that come out of left field, usually prompting you to do or buy something. Scam messages asking you to reset your password on your banking website, asking you to buy drugs or invest in markets, or adult messages fall into the category of spam and SHOULD be filtered.
It is important to know the difference because filtering systems often learn from your choices, so learning the difference can lead to more successful filtering results.
Without all of these spam filtering methods, email would be a useless form of communication — everyone would be trying to talk to you at once. The next time you hit your Mark as Spam button, think of it as yelling “SHUT UP! I’M TRYING TO TALK!” to the Internet.
I might look into installing a button like that in my car the next time I take the kids on vacation.