Wordnik: New online dictionary redefines ‘look it up’

Harold Schiffman haroldfs at gmail.com
Tue Mar 17 00:47:10 UTC 2009


New online dictionary redefines ‘look it up’

*Lexicographer Erin McKean’s interactive ‘Wordnik’ is projected to be the
largest online dictionary ever.*
By Jina Moore <http://www.csmonitor.com/cgi-bin/contactus.pl> |
Correspondent / March 16, 2009

Chicago

Erin McKean doesn’t look much like a revolutionary. She speaks softly. She
sews her own skirts and writes a daily blog entry about vintage patterns.
She does work out of a basement, but it’s got carpeting and good lighting
and roughly 1,500 books, many of whose titles involve the word “words.” Her
suburban Chicago home is not exactly the picture of subversion.

This week, though, she is slated to launch what may be the biggest
revolution in the printed word since, well, printed words.

Ms. McKean’s brainchild is called Wordnik <http://www.wordnik.com/>, and it
combines the best practices of the old-fashioned desk reference with
Internet innovations. Words can be tagged like a blog entry, their
pronunciation recorded and replayed like streaming radio, their related
words cataloged like a list of books customers also bought at an online book
depot. When the paper page gives way to the Web page, everything about the
way we think of words will change, McKean says. “This project,” she predicts
in a quiet voice devoid of bravado, “is going to completely revolutionize
all of dictionarymaking forever.”

Granted, a dictionary is closer to a database than a mystery thriller, its
authors nothing like, say, John Grisham. But to McKean, nothing has ever
seemed more fascinating than collecting and organizing American words.

McKean
<http://www.ted.com/index.php/talks/erin_mckean_redefines_the_dictionary.html>was
8 years old when she decided that when she grew up, she wanted to be a
lexicographer – the technical term for a writer or editor of dictionaries.
She first found it in her daily scouring of The Wall Street Journal. Her
father was a Journal devotee, and McKean liked the human interest stories
(but, she jokes, “even then, I knew enough not to read the editorial page.”)
A feature article celebrated Oxford University Press’s 1980 Word of the Year
– ayatollah – and talked about preparing the newest edition of its most
famous title, the Oxford English Dictionary.

“I think I was really attracted by the fact that it was taking 21 years to
make the second edition of the Oxford English Dictionary<http://www.oed.com/>,”
she recalls. “I was 8. Twenty-one years was forever.”

The lexicography bug stuck, in part because McKean loved language. She was a
voracious reader, plowing through her local libraries’ stacks and devouring
anything she found at home, she says. “If it was lying around, I read it. If
my parents didn’t want me to read it,” she says, “they had to hide it.”

As her classmates abandoned childhood dreams of firefighting or Broadway
stardom for teaching or nursing, McKean stuck with words. “Nobody ever tried
to talk me out of it. Nobody knew enough about it to know if it was easy or
difficult,” she recalls. “Nobody had a brother who was a lexicographer the
way they might have a brother who was a firefighter or an English teacher or
a doctor or a lawyer. Nobody had ever met one.”

For good reason, she found out as she pursued joint bachelor’s and master’s
degrees in linguistics at the University of Chicago: There aren’t a whole
lot of jobs for lexicographers. McKean estimates there may be 200 working
lexicographers in America today, and that the field sees about two full-time
openings a year.

McKean got her start through a combination of luck and ingenuity: She called
up the only dictionary publisher based in Chicago and asked for an
internship. After graduation, the internship turned into a job, which
eventually turned into a career at Oxford University Press, a move she
likens to “being called up by the Yankees.” At age 29, McKean was the chief
editor of the American dictionaries group. “If it had Oxford and American in
the title,” she says, “it was my fault.”

She could dream up bestsellers, like the Oxford American Writers
Thesaurus,<http://www.oup.com/us/catalog/general/subject/Reference/EnglishDictionaries/?view=usa&ci=0195170768>but
among her favorite books is the first one she acquired at her new
home,
a publishing house with a reputation for erudition. “It was called Slayer
Slang.<http://http//books.google.com/books?id=R_6b2YyKI7oC&dq=slayer+slang&printsec=frontcover&source=bn&hl=en&ei=Yjy-SeWLFdCvtwfr57T3Cw&sa=X&oi=book_result&resnum=4&ct=result>…[It]
is a treatment of the slang of Buffy the Vampire Slayer,” the title
character in a hit television drama from the late 1990s.
The purchase revealed as much about McKean’s sensibility as it did about her
business sense. And when it comes to dictionaries, McKean says, sensibility
is key. “People have this idea of the Platonic ideal of the dictionary.
That’s why they call it ‘the dictionary’…. They think that all dictionaries
are pretty much the same.” Not so, she says. There are five print dictionary
publishers in the US, each choosing which of the billions of words they’ve
collected will make it into print.

What gets left out depends on the personality of the publishing house. On
the other hand, how to evaluate what gets in is a task beyond most people.
“Most consumers don’t have a good metric for deciding on whether the
dictionary they want to use is a good one … so they flip the book over, then
go to the back, and it says, ‘over 250,000 entries.’ And they go, ‘Great,
this dictionary must be awesome!’ ” she says. “Because if you don’t know a
word, how do you judge the quality of the definition?”

Enter Wordnik, McKean’s newest project. In the infinite space of the
Internet, she can define as many words as she wants.
“There are hundreds of thousands of words that aren’t in any print
dictionary today … because there’s no space for all of them.”

Wordnik has space for many of them, and for their bells and whistles. Her
team of seven has analyzed what print and online dictionaries do and don’t
do well. They’ve built a user-friendly resource that should be the best –
and biggest – of both worlds. Wordnik generates its content from a database
of 4 billion words, twice as many as that of her last employer. “Four
billion words,” she says with a shrug, “is what you can pick up lying around
on the floor of the Internet.”
Want to evaluate a definition of a word you’ve never met? No problem; other
users can tell you if they favor that definition. Want to know what other
words often appear in the same sentence as what you’ve just looked up?
There’s a section called “related” for words used in the same context as
yours. Need to know what a farthingale, for instance, looks like? Images are
imported to the page from photo-depot giant Flickr. Unsure if you really
understood the definition? Every word has several example sentences, culled
at random from that Internet floor and then sorted so the best rise to the
top of your search page.

These, McKean says, are critical. They’ve been vanishing from print
dictionaries as publishers try to cram them with more words, but contextual
sentences are what make people pick up reference books in the first place.
“We think people go to a dictionary to find out what a word means,” she
says. Not so. “Most people go to the dictionary because they don’t want to
look stupid.”

They don’t want to sound stupid, either, which is why every word has an
audio file of its pronunciation. Users can record their own pronunciations,
too.

Print dictionaries do have one clear advantage, though: They show more than
one word at a time. That makes skimming the print page fun, and McKean has
tried to mimic that feeling with a “serendipity” feature, which generates
words at random.

Perhaps the most surprising element of McKean’s new dictionary is a
frequency graph, which shows how often the word you’ve looked up was used,
as a written word, in a year. That can tell you more about history than just
the etymological: Take “chad,” for instance. The word’s frequency in 2000 is
high – thanks, of course, to that year’s presidential election controversy.
But there are signs of heavy usage much earlier. (EDITOR’S NOTE: The
original version of this story incorrectly used the word “entymological”
instead of “etymological”)

“We have one text from 1870 that has the word ‘chad’ a lot, because it’s
about Jacquard [weaving] looms, which used to be run on punch cards,” McKean
explains. “They had the same chad problems as the Florida ballots.”

Ultimately, McKean’s goal is rather humble, when judged against the volume
of words that have accumulated in the 400-year history of modern English.

“Ideally my goal is, before I die, to have some information about every word
that’s ever been used in print.”
That may be the real revolution: digitizing a bit of data about every word
we English speakers have ever put on the old-fashioned page. Byte by byte,
the soft-spoken lexicographer will see her revolution through.

Forwarded from Christian Science Monitor, 3/16/09



-- 
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Harold F. Schiffman

Professor Emeritus of
Dravidian Linguistics and Culture
Dept. of South Asia Studies
University of Pennsylvania
Philadelphia, PA 19104-6305

Phone:  (215) 898-7475
Fax:  (215) 573-2138

Email:  haroldfs at gmail.com
http://ccat.sas.upenn.edu/~haroldfs/

-------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lgpolicy-list/attachments/20090316/447b6236/attachment.htm>


More information about the Lgpolicy-list mailing list