Spam: Definitions, Values, and Responses

James Grimmelmann and Becky Bolin

Yale PORTIA/ISP Reading Group

March 9, 2005

http://james.grimmelmann.net/slides/
2005-03-09-spam.html

In this presentaiton, we survey the spam landscape. We discuss the definition of spam, the policy values it implicates, and the merits of various potential responses. Our principal theme is that there is no magic bullet that will solve the spam problem.

These annotations are not a transcription of the presentation. They are more in the way of footnotes, explaining the references made in the slides themselves.

what is spam?

The most common definition is "unsolicited bulk commercial email." All four of those features are problematic.

unsolicited?

whose consent is needed?
- Plaxo
what constitutes consent?
- opt-in vs. opt-out
how far does consent extend?
- Hotmail/XBox Live/Microsoft

Plaxo is an address book service. If you add a contact's email, Plaxo invites that contact to join Plaxo. The email invitation is unsolicted by the recipient. Is this spam?

Microsoft is an example of amorphous consent boundaries. If I opt out of receiving messages from XBox Live, have I opted out of receiving all email from Microsoft? I may or may not know that XBox Live is a Microsoft service.

bulk?

non-bulky spam
- Dekart
bulky non-spam
- Hamidi
counting problems
- Forest Service

Dekart is an Internet security company that sends LawMeme an average of one press release a week; it has ignored all our requests to be removed from its list.

In Hamidi v. Intel, the California Supreme Court held that Ken Hamidi could not be found liable on a trespass to chattels theory for sending thousands of emails to Intel employees. He honored opt-out requests from individuals, but not from Intel itself. One way of viewing the holding is as a statement that Hamidi's emails were not spam, despite their bulk.

The Forest Service considered rejecting "duplicative" comments on its forest-management proposals. It was understood that part of its implementation would have been to install filters on its incoming email designed to reject message texts recognizable as coming from grassroots organizations with online "action centers" that let members send emails as part of the notice-and-comment procedures. Each person was only sending one email, but the Forest Service considered the totality to be spam.

commercial?

charitable spam
- tsunami relief
political spam
- MoveOn
nutjob spam
- Robby Todino, time traveler

In the aftermath of the 2004 Indian Ocean tsunami, many charities sent their members urgent action alerts. The bulk of these alerts -- many using the same, previously rare words such as "tsunami" -- caused many ISPs to reject many of them as being spam. MoveOn.org reported similar problems with its emails to its members being rejected as spam by ISPs' spam filters. At least as we draw the commercial speech line in the U.S., these emails were not commercial. They would have been allowed as phone calls under the do-not-call list, for example.

Robby Todino sent a large number of emails requesting assistance assembling the parts for a time machine. Todino was apparently quite sincere in his requests; he genuinely believed that he was a time traveler from the future, stranded in 2004.

email?

spam predates email
- USENET
non-email spam today
- search engines
spam will outlive email
- SMS, IM, VoIP . . .

The famous Ground Zero of commercial spam was the Green Card Lottery message of 1994, cross-posted to over 6,000 USENET newsgroups.

Search engines are a major target of spam today; thousands of repetitive web sites or blog comments can elevate a site briefly to the top of certain searches, resulting in advertising or sales revenue for the spammer. There have even been regular reports of HTTP referrer log spam.

Clay Shirky: "Social software is stuff that gets spammed." I prefer, "Any sufficiently advanced technology is indistinguishable from a spam vector."

policy values

Next, we consider the issues at stake in the spam wars.

freedom of speech

the right to solicit
collateral damage
the right to receive
the right not to receive

Free speech is a core value in American politics and law. It ramifies in many ways in discussions of communications. The right to reach out to people is important, even for commercial purposes. Blanket bans on soliciation are almost always considered illegal. At least some room needs to be made for solicitation. Similarly, a good solution will not be overbroad; it will not cause other forms of speech to be cut off. Note that this value also implicates people's interest in hearing speech directed at them. My rights are restricted if incoming messages to me are blocked. Finally, there is a speech value in being able to choose what you don't want to hear. Especially in an age of massive excess in communications, there is an autonomy interest in being able to filter out unwanted messages so that you can recognize the ones you want.

privacy

anonymity
cryptography
database accountability

Some measure of privacy is also important to human dignity and autonomy. The most immediately obvious form this interest takes is in the ability to act anonymously. Requiring a license to use email, for example, would offend this interest. One also has a privacy interest in keeping certain communications secret -- people who use encryption to communicate with each other are exercising this interest. Finally, there is a privacy interest in avoiding the collection of large quantities of data on individuals without some form of accountability. Those who hold information about you have some responsibility to use it properly.

economics

spam as wasteful advertising
destructive self-help
costly arms races

Spam costs society money. A good solution will cost less than the problem. At root, the economic problem of spam is that it results in small gains for the spammer at huge cost to recipients. It is thus a form of advertising that destroys wealth. Many of the responses to spam (especially some of the technical ones) are also net negatives; in the process of driving spammers away, some systems cause slowdowns for third parties or openly harm other computer systems. Where these responses are incorrectly targeted (as in a successful joe-job), the damage is even worse. Spam also induces classic arms race behavior, in which spammers and anti-spammers spent large quantities of time and money coming up with new technical measures and countermeasures. Collectively, the fight goes nowhere; all the resources spent on it are pure waste.

architecture

end-to-end values
transition problems

Finally, the Internet does certain things extremely well. It would be unfortunate to sacrifice those virtues for a temporary gain in the fight against spam. Currently, the architecture of the Internet is broadly end-to-end. This structure supports innovation at the endpoints, provides for cheap and commoditized infrastructure, and hinders monopoly control of Internet resources. It would also be unfortunate, verging on impracticable, if the shift to an anti-spam architecture were too disruptive, regardless of how idyllic the end state would be. Solutions that require every email user to upgrade his or her email client, for example, are nonstarters in the near-to-medium term.

technical responses

Next, we consider anti-spam techniques that involve technological methods. These techniques try to make spaming harder or anti-spamming easier as a technical matter.

basic technical tools

clients vs. servers
challenge-response
data aggregation
{black|white}listing

These are the basic primitives from which most technical solutions are built. Technical solutions could be implemented by email clients, email servers, or both. A challenge-response system requires the sender of a message to provide further information in response to a challenge issued by the recipient; typically such systems are designed with the hopes that spammers will have a harder time successfully responding than legitimate senders will. Data aggregation involves pooling information from many sources (most often, many recipients) to develop profiles of spam messages and spam senders for easier future recognition. Blacklisting is the process of preparing a known list of spam senders from whom all messages will be rejected; whitelisting, its opposite, is the process of preparing a known list of senders who are not spammers from whom all messages will be accepted.

sender identification

identifying clients
identifying servers

Sender identification techniques involve adding information to email such that a client or a server in some sense "proves" that it is not a spammer. A third party may be invited to vouch for its bona fides, the user may know that the sender is trustworthy from previous interactions, or the sender may take some action indicating a real-life identity subject to sanction if the message turns out to be spam.

filtering

headers
keywords
fingerprints
machine learning

Filtering systems rely on identifying characteristics of particular messages to flag them as spam or non-spam. At the most basic level, filters can examine the headers -- who sent the email, what system relayed it, what email client produced it, and so on. The next level of filtering involves searching for particular spam-suggesting keywords, such as the names of certain prescription drugs. Some systems take a hash value of entire messages to establish a baseline of known spams (although the value of this technique has been severely degraded by advances in spam-sending programs). Other systems, such as the filters used by many major ISPs, use sophisticated AI techniques to "learn" the characteristics of spam and to predict whether a new message is likely to be spam or not.

pay-to-send

pay with attention
pay with cycles
pay with money

Pay-to-send systems force a sender to provide proof that it has "paid" for a message in one of a number of currencies. In so doing, they attempt to shift some of the costs of spam back on to senders. They often use sender identification or challenge-response techniques at an implementation level. One technique used increasingly for web authentication is payment with attention; the sender is required (at first contact) to provide proof that a human has looked at the email, typically by typing in some text obfuscated to make it non-computer-readable. Payment with cycles is a cryptographic technique that forces the sender to spend a certain amount of processing time to send a message (greater than the amount of time required for the recipient to handle it); the goal is to make bulk emailing slower, and thus less profitable for spammers. Finally, some proposals involve actual proof of payment with money, as by requiring the electronic transfer of money from sender to recipient to send a message.

legal responses

Finally, we discuss the structure of possible legal responses to spam.

basic legal concerns

jurisdictional limits
conflicting definitions
deference or preemption

Because spam is a nonlocal problem, the usual legal questions inherent to nonlocal regulations arise. How far can any one jurisdiction's prescriptive authority extend? What happens when different jurisdictions pass mutually-incompatible mandates? Should larger jurisdictions (e.g. the United States) defer to the laws of smaller ones (e.g. the fifty states) or preempt them? These questions are not unique to spam.

legal responses

punish spammers harder
track down spammers
punish beneficiaries of spam
technical mandates
recruit private-sector allies
do-not-email

By "recruit private-sector allies" we are thinking of systems that invite individuals and entities to sue spammers or to help in locating them. Private-attorney-general and civil-cause-of-action statutes help do the former; bounties for finding spammers do the latter.