Spam: Problems, Solutions, and More Problems

James Grimmelmann

Yale PORTIA/ISP Reading Group

April 19, 2004

http://james.grimmelmann.net/slides/
2004-04-19-spam.html

I recommend the 2005 version of this presentation. It benefitted greatly from the intervening year's work and from the contributions of my collaborator, Becky Bolin. I am including here annotations only to explain references; not to detail the substance of the presentation.

prelude

this week in spam

SURBL: content-filter + RBL
Habeas wins in court (again)
ten years ago: Canter and Siegel
sentencing guidelines released
FTC: "sexually explicit"
Messagesoft: $14M+

The SURBL provides a list of web site URIs known to occur in spam. Since the primary motivation for spam is to advertise, the URIs to which spam directs readers are the least common denominator across spam.

Habeas licenses the use of a header to flag messages as not being spam. It then sues those who use the header without permission for copyright infringement.

Canter and Siegel, the infamous "green card lottery" spammers, gave many netizens their first experience with spam in a massive cross-post to over 6,000 USENET newsgroups.

The CAN-SPAM Act included criminal penalties for certain kinds of spamming. The United States Sentencing Commission analogized spamming to fraud for purposes of creating sentencing guidelines for spamming.

The FTC mandated that sexually explicit spam bear the tag "SEXUALLY EXPLICIT" in its subject header.

The $14 million paid by SurfControl to acquire MessageSoft put a price tag on anti-spam expertise.

what is spam?

unsolicited?

unsolicited by whom?
- the Plaxo problem
what is solicitation?
- the opt-in/opt-out problem

Plaxo is an address book service. If you add a contact's email, Plaxo invites that contact to join Plaxo. The email invitation is unsolicted by the recipient. Is this spam?

bulk?

the Hamidi problem
bulky non-spam
- Hamidi
the Forest Service problem
how many recipients == bulk?
- CAN-SPAM: 1

In Hamidi v. Intel, the California Supreme Court held that Ken Hamidi could not be found liable on a trespass to chattels theory for sending thousands of emails to Intel employees. He honored opt-out requests from individuals, but not from Intel itself. One way of viewing the holding is as a statement that Hamidi's emails were not spam, despite their bulk.

The Forest Service considered rejecting "duplicative" comments on its forest-management proposals. It was understood that part of its implementation would have been to install filters on its incoming email designed to reject message texts recognizable as coming from grassroots organizations with online "action centers" that let members send emails as part of the notice-and-comment procedures. Each person was only sending one email, but the Forest Service considered the totality to be spam.

Under certain circumstances, the CAN-SPAM Act holds that sending a single unsolicited email can constitute spamming.

commercial?

charitable spam
political spam
nutjob spam

email?

spam predates email . . .
- postal spam
- newsgroup spam
. . . and will outlive email
- blog comment spam
- referrer log spam
- search engine spam

Postal spam is better known as "junk mail." The famous Ground Zero of commercial spam was the Green Card Lottery message of 1994, cross-posted to over 6,000 USENET newsgroups.

Search engines are a major target of spam today; thousands of repetitive web sites or blog comments can elevate a site briefly to the top of certain searches, resulting in advertising or sales revenue for the spammer. There have even been regular reports of HTTP referrer log spam, in which a web site is bombarded with requests that claim to have been referred by a page the spammer wishes to elevate in search rankings. Search engines see see that page "linked" from the page of server stats.

"solutions"

major players

users
spammers
major ISPs
governments
anti-spam vendors
anti-spam vigilantes

technical responses

filtering
sender verification
pay-to-send

Filtering systems rely on identifying characteristics of particular messages to flag them as spam or non-spam.

Sender identification techniques involve adding information to email such that a client or a server in some sense "proves" that it is not a spammer.

Pay-to-send systems force a sender to provide proof that it has "paid" for a message in one of a number of currencies. In so doing, they attempt to shift some of the costs of spam back on to senders. They often use sender identification or challenge-response techniques at an implementation level.

technical: filtering

client or server?
techniques:
- headers
- token-matching
- fingerprinting
- machine learning

At the most basic level, filters can examine the headers -- who sent the email, what system relayed it, what email client produced it, and so on. The next level of filtering involves searching for particular spam-suggesting keywords, such as the names of certain prescription drugs. Some systems take a hash value of entire messages to establish a baseline of known spams (although the value of this technique has been severely degraded by advances in spam-sending programs). Other systems, such as the filters used by many major ISPs, use sophisticated AI techniques to "learn" the characteristics of spam and to predict whether a new message is likely to be spam or not.

technical: IDing senders

client, server, or neither?
often, to assist filtering
techniques:
- SPF
- public-key crypto
- 3rd-party voucher
- shared whitelists

In SPF, SMTP servers verify that incoming email came from a computer authorized to send email for the domain from which it comes, a simple test that can filter out many spams with forged sender information. Public-key cryptography may be used to establish even stronger guarantees of sender identity to avoid spoofing. Third parties may also be invited to vouch for senders' bona fides, Shared whitelists are used by groups of recipients who mutually trust each other to identify senders who are known not to be spammers.

technical: pay-to-send

client or server?
techniques:
- pay with attention
- pay with cycles
- pay with money

In payment with attention; the sender is required (at first contact) to provide proof that a human has looked at the email, typically by typing in some text obfuscated to make it non-computer-readable. Payment with cycles is a cryptographic technique that forces the sender to spend a certain amount of processing time to send a message (greater than the amount of time required for the recipient to handle it); the goal is to make bulk emailing slower, and thus less profitable for spammers. Finally, some proposals involve actual proof of payment with money, as by requiring the electronic transfer of money from sender to recipient to send a message.

legal responses

mandate a technical solution
hit spammers harder
track down spammers
recruit private-sector allies
hit the beneficiaries
do-not-email

By "recruit private-sector allies" we are thinking of systems that invite individuals and entities to sue spammers or to help in locating them. Private-attorney-general and civil-cause-of-action statutes help do the former; bounties for finding spammers do the latter. A do-not-email list would include email addresses to which spam could not be sent, by analogy to the do-not-call list for phone numbers.

legal: CAN-SPAM

bulk falsification criminalized
civil offenses:
- misleading headers
- failure to honor opt-out
- missing disclosure
third-party liability
enforcement: feds, states, ISPs
FTC to investigate:
- bounties, do-not-email

legal: conflict of laws

CAN-SPAM preempts state law
- except for falsification
standard cyberlaw problems
- jurisdiction
- clash of definitions

Because spam is a nonlocal problem, the usual legal questions inherent to nonlocal regulations arise. How far can any one jurisdiction's prescriptive authority extend? What happens when different jurisdictions pass mutually-incompatible mandates? These questions are not unique to spam.

values at stake

freedom of speech

collateral damage to speech
- mailing lists
- chilling effect
right to contact new people
recipients' speech interests
- right to receive
- right not to receive

Maling lists regularly have problems with overly zealous spam blockers. "Chilling effect" refers to the possibility that legitimate senders will refrain from mailing out of fear of the consequences if they are mistakenly thought to be spammers.

The right to reach out to people is important, even for commercial purposes. Blanket bans on soliciation are almost always considered illegal. At least some room needs to be made for solicitation.

. My rights are restricted if incoming messages to me are blocked against my wishes. There is also a speech value in being able to choose what you don't want to hear. Especially in an age of massive excess in communications, there is an autonomy interest in being able to filter out unwanted messages so that you can recognize the ones you want.

privacy

first they came for the spammers . . .
clipper reloaded
- is anonymity illegal?
- who can pierce the veil?
- do you trust the crypto?

The first line is a reference to the idea that many people seem far less concerned with privacy rights when punishing spammers is at stake than in almost any other area of life. Others fear that anti-spam zeal could start us down a slippery slope towards erosion of privacy because spammers are so unsympathetic.

Many of the contentious issues from the debates over the "Clipper Chip" recur in debates over anti-spam measures. Some proposals to overhual email would make it impossible to email anonymously. Even if there is some form of pseudonymity built into a spam-resistant email infrastructure, it matters enormously who has the power to determine a pseudonym's real-life identity. Any proposal that makes heavy use of cryptography also raises concerns over whether the cryptography itself is secure and over who has access to be able to inspect the cryptography in use.

architecture

do we need to rewrite all . . .
- mailers?
- mail clients?
- MTAs?
- routers?
what do we do in the meantime?

The transition costs of moving to spam-resistant architectures may be enormous. Many solutions are ineffective if either the sender or recipient has not upgraded. Some cause email to be undeliverable if one has upgraded and one has not. Some solutions even require that every sender and recipient be upgraded for any to enjoy its benefits. The transition strategy for any anti-spam solution is part of the cost-benefit analysis.

arms races

joe-jobs
man-in-the-middle>
beating the filters
miscellaneous cracking

spammers are scarily clever

Spam induces classic arms race behavior, in which spammers and anti-spammers spent large quantities of time and money coming up with new technical measures and countermeasures. Ifthe fight goes nowhere, then all the resources spent on it are pure waste.

In a joe-job, an anti-spam system is tricked into a false positive and used as a weapon against some innocent third party whom it mistakes for a spammer. Spammers have proven adept at analyzing anti-spam systems and finding ways to imitate trusted parties to sneak their messages through anti-spam blockades. They have proven especially clever at producing random-text babblers that make spams resemble "real" email increasingly closely, so as to foil textual filters. They have also proven quite willing to hack into anti-spam systems in order to disable them.