Interesting corpus planning?

Harold F. Schiffman haroldfs at
Mon Mar 1 17:45:37 UTC 2004

>>From the  The International Herald Tribune |

Adding spice to spam? Phony names pique interest

Lisa Napoli NYT
Friday, February 6, 2004

Purposes L. Xylophonist sounds like my kind of man. Unique. Creative.
Focused, with a hint of formality. But there is no way to be certain that
Mr. Xylophonist is, in fact, a mister. Actually, it is a pretty safe bet
he is not a person at all. The fact that his name appeared in the return
line of a piece of unsolicited e-mail almost ensures that he is not.

Xylophonist wrote trying to sell some pamphlet about maximizing profits on
eBay. Or maybe that was what Beiderbecke P. Sawhorse was pitching. It was
definitely not the one from Marylou Bowling; she wrote to tell about
"government Free Cash Grant Programs." Then again, that might have been
from Elfrieda Billman. As for Usefully T.  Medicaids and Boggs Darrin,
they both wrote about cheap drug sales, no prescription needed.

Alongside those missives from friends and that drudgery from the office is
a cast of e-mail characters with fantastic names promising all manner of
stuff for sale. Frequently, the promises are bogus; virtually all of the
names are, too. Though it seems impossible to imagine the unwanted e-mail
known as spam as anything but a nuisance, there is something creative
about these return addresses - even if they are being used for untoward
purposes. On Web bulletin boards, they sometimes draw admiring

"I like a lot of the names I see on spam e-mails because they're
completely abstract, with little conception of culture or traditional
sounds," said a post by someone using the name Oissubke, a self-described
fiction writer. When it comes to making names up, August Kleimo, whose
name is just unusual enough that it might have been invented, knows that
the best source of material is reality.

Kleimo, a Web designer in the Venice section of Los Angeles, said he was
trolling at the Census Bureau's Web site a few years ago and found "tons
of free data," including all the last names from the census of 1990. There
was also information on which of those names were most popular. This
inspired Kleimo in a way that only a computer aficionado could be
inspired: He wrote a random-name generator that spits out pairings
($ name.cfm). Site visitors can adjust the obscurity
factor depending on how bizarre they would like the names served up to be.

Now in its third year, the site attracts about 3,000 visitors a day,
Kleimo said. And not everyone who visits uses his invention for harmless
fun. "I've always suspected that people use it for spam," he said. To be
sure, many of the common software programs for spammers include random
name generation. And Kleimo's is not the only random name generator on the
Web; dozens can be sampled there. Mike Campbell, for example, an amateur
etymologist and software developer in Victoria, British Columbia, built
Behind the Name (, which allows visitors
to generate names in various languages, from Icelandic to Lithuanian to
classical Greek.

Chris Pound, who works in the information technology department at Rice
University in Houston, has written more than 40 random generators,
including what he calls an "amazing verbal kung-fu" generator, as well as
one that merges names from the worlds of Harry Potter and Charles Dickens
( "As a kid, I was a fan of the novels of M.A.R.
Barker, who is a linguistic anthropologist," said Pound, whose Web site
offers the code he uses to create his generators. "He, like J.R.R.
Tolkien, had invented languages for all of the empires in his fantasy
novels. It becomes a hobby after a while when you notice things you can
turn into a name generator." But for spammers, name generators can be the
bones of the business.

Wildly unusual invented proper names are designed to attract your
attention. Less inventive names are chosen to lead you to think the mail
might just be real, and to open it. But aside from grabbing the
recipient's attention, random names are used by spammers because they are
more likely to trick the antispammers, including Internet service
providers. "Spammers use software to randomly generate lots of unique
names because they know it reduces the chance of their spam being filtered
by ISP's or blocked by users," said Jason Catlett, founder of Junkbusters,
a site dedicated to the elimination of unwanted solicitations.

Randomly generated names are more likely to squeeze through so-called
Bayesian filters, which keep track of common words used in spam, like
Viagra, and weed them out. A human may detect a randomly generated name as
a fake, said Ray Everett-Church, chief privacy officer of the ePrivacy
Group, which makes a filter called SpamSquelcher, but "a filter can't
really see the irony of Tupperware J. Smithington."

The New York Times

More information about the Lgpolicy-list mailing list