[Lexicog] RE: [afrilex] Re: [DSNA] FW: Macmillan's recent announcement

amsler at CS.UTEXAS.EDU amsler at CS.UTEXAS.EDU
Fri Nov 9 15:55:17 UTC 2012


I feel I should add something to this discussion...

I'm a computational lexicologist. My interest has been in the use of  
computers to study the contents of 'machine-readable dictionaries', a  
term I coined in 1980 in my dissertation on The Structure of the  
Merriam-Webster Pocket Dictionary. (That work, in turn, led to George  
Miller producing WordNet).

Electronic dictionaries have only partially achieved their potential  
because they have only expanded their access capabilities in fairly  
minor ways despite an avalanche of new computational capabilities.  
Fundamentally, electronic dictionaries "think" of themselves as print  
dictionaries being offered via electronic access. This is a very  
limiting vision.

The work I did on the analysis of dictionary definitions demonstrated  
that there was an imperfect, yet intriguing, taxonomy of definition  
texts and showed that the alphabetic organization of dictionary  
entries was outmoded except under special circumstances. I.e., for  
example, you had to know how to spell a word to look it up; you had to  
know a word existed that dealt with the meaning you were trying to  
express to know to how to look it up; and when you did look a word up  
you were given a tiny view of the dictionary's contents that didn't  
show you the other words whose definitions were related to the entry  
you were examining in terms of taxonomic relatives. Sure, some  
dictionaries did an excellent job of including information on synonyms  
(Merriam-Webster's "synonym paragraphs" come to mind, for their  
inclusion of defining differences in text explanations; but NONE gave  
taxonomic or part/whole related headwords).

Electronic dictionaries offer new capabilities in terms of now  
providing one-at-a-time retrieval of entries based on words within  
definitions; provide for word game options such as finding anagrams of  
words. Algorithmic techniques such as the SOUNDEX system allow finding  
words based on their sounds instead of their spellings (something that  
Google seems better at than electronic dictionaries).

But fundamentally, dictionaries as isolated islands of knowledge, are dying.

Wikipedia offers "disambiguation pages" that extend beyond what is in  
any dictionary, print or electronic. They engage in post-modern  
lexicography in which proper nouns ('named entities' in the  
computational linguistic community's jargon) share the likelihood of  
being what a user is interested in looking up instead of just lexical  
headwords. I remember my shock at discovering that in the Brown Corpus  
the word "TIME" most often referred to the name of a magazine and not  
any of the senses in a dictionary....

Web search engines have implemented "definition" as a search box  
keyword that retrieves multiple web site hits giving the definition of  
terms. Some (duckduckgo.com) have even taken to assuming that a  
definition is the fundamental information to retrieve for any isolated  
keyword entered into a search box. The dictionary as a specifically  
evoked search is intuitively determined from the query string.

I doubt users will for long want to go to one publisher's web site and  
then learn their specific interface all just to look up one unknown  
word to get one publisher's take on its meaning... unless they are  
interested in a very specialized type of knowledge such as definitive  
etymological knowledge or a very specialized form of display. The pace  
of new vocabulary has made most print dictionary publisher's web sites  
antiquated.

So, where does the future of lexicography lie. I believe it lies in  
the development of new lexical knowledge resources, new ways to  
display existing dictionary information and in connecting dictionary  
information to other knowledge.

For example. What would the dictionary look like if Google search  
handled dictionary lookup? You'd have best match for strings of  
keywords to a dictionary entry. You'd have sponsored links displayed  
atop the free search hits. Sponsored links aren't all bad; it depends  
on their relevance. If, for example, sponsored links went to the  
titles of books related to the word or meaning being looked up, this  
could be a good thing. I have often wondered how many of the headwords  
in a dictionary have books with that title or books whose content is  
about one of their sense definitions. If entries linked to government  
publications or public service information or news stories for words  
currently in the news it could be a good thing as well.

Of course, the problem here is that Wikipedia and Google and Amazon  
already exist and they are all too eager to take the leap toward  
incorporating dictionary information into their search results.

What isn't yet done may well be done by web-based companies. However,  
in some ways dictionaries excel in what they do.

(1) Compaction of information. The dictionary entry may be the most  
complex bit of typography ever devised. It involved more fonts and  
formatting clues than any other type of text I've encountered. This  
hasn't been well exploited by dictionaries in their electronic  
interfaces. For example, if one could do arbitrary string search  
through a dictionary's entries one could find similar entries to an  
existing entry just based on the syntax of highly compact strings. No  
need to detail what one is looking for, find me more entries that  
contain: "n 1 cap:" (headwords whose 1st sense is a capitalized word)  
or "<professor ~" (words that appear in example sentences following  
'professor' as 'emeritus') or "`path-thik\" (the last part of the  
pronunciation of "homeopathic" used as a query for find words that end  
in similar pronounciations).

Note that in all these cases these are very incomplete strings taken  
from actual definition entries being used as queries in a very simple  
string search algorithm and not a highly structured search query that  
required weeks and months of programming of an interface to allow  
users to ask such questions about dictionary content. It's a "find me  
more entries that contain this" query. And it works because of the  
rigorous highly complex syntax print dictionaries have developed over  
decades of evolutionary advances.

(2) defining formulae. Dictionaries employ similar defining styles  
across entries with related content. Yet, they don't allow the user  
convenient access to those defining formulae so they could retrieve  
definitions based on their use. In part, I suspect this resulted from  
handing separate lexicographers the task of defining all the entries  
of certain groups of words such as animals, occupations, vehicles,  
etc. It might be useful to be able to see the definitions that were  
written for a given defining formula. Defining formulae are more  
complex than can be retrieved by string searches since they employ  
natural language that allows arbitrary numbers of adjectives and  
and/or combinations to use the same formula. The underlying formula  
would have to be identified to link together all definition texts that  
used it.

(3) beyond one-at-a-time retrieval of dictionary entries. The  
information science community has long used techniques such as  
keyword-in-context to display search results as a concordance.  
Electronic dictionaries have a very annoying habit of assuming that  
readers want to read entries retrieved as one-at-a-time formatted  
entries shown as they would appear in the printed book. NO, not all of  
us do and many of us can read a KWIC listing more efficiently to see  
what is going on across all entries that will match a query's results  
displayed together, one result per line, formatted for horizontal  
alignment of their shared text.

So... what to do. Either get busy dying or get busy living as the saying goes.
Dictionary publishers need to start figuring out how to live on the  
web as a participant of their environment or figure out how to offer  
their polished content in ways that don't currently exist. It isn't  
quite a matter of whether it's a book, an online interface, or a  
wireless interface, it's what it displays that is useful. It's a  
matter of either having lexical knowlege that nobody else has or  
displaying lexical knowledge in ways that are so convenient that other  
means of access are less attractive.

There... now I've managed to offend as many people as possible...

Dr. Robert A. Amsler
Computational Lexicologist
Vienna, Virginia















------------------------------------

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    lexicographylist-digest at yahoogroups.com 
    lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list