10.1572, Qs: History of Corpora, Terminology/Y2K & Beyond

LINGUIST Network linguist at linguistlist.org
Wed Oct 20 22:55:42 UTC 1999


LINGUIST List:  Vol-10-1572. Wed Oct 20 1999. ISSN: 1068-4875.

Subject: 10.1572, Qs: History of Corpora, Terminology/Y2K & Beyond

Moderators: Anthony Rodrigues Aristar: Wayne State U.<aristar at linguistlist.org>
            Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Reviews: Andrew Carnie: U. of Arizona <carnie at linguistlist.org>

Associate Editors:  Martin Jacobsen <marty at linguistlist.org>
                    Ljuba Veselinova <ljuba at linguistlist.org>
		    Scott Fults <scott at linguistlist.org>
		    Jody Huellmantel <jody at linguistlist.org>
		    Karen Milligan <karen at linguistlist.org>

Assistant Editors:  Lydia Grebenyova <lydia at linguistlist.org>
		    Naomi Ogasawara <naomi at linguistlist.org>
		    James Yuells <james at linguistlist.org>

Software development: John H. Remmers <remmers at emunix.emich.edu>
                      Chris Brown <chris at linguistlist.org>
                      Qian Liao <qian at linguistlist.org>

Home Page:  http://linguistlist.org/


Editor for this issue: Karen Milligan <karen at linguistlist.org>
 ==========================================================================

We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then  strongly encouraged to post a summary to the list.   This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.

=================================Directory=================================

1)
Date:  Tue, 19 Oct 1999 21:57:8 +0800
From:  Smiley <smiley at sh163a.sta.net.cn>
Subject:  History of Corpora

2)
Date:  Tue, 19 Oct 1999 11:21:24 -0700
From:  "James Giangola" <jamesg at Nuance.COM>
Subject:  Terminology for Y2K and beyond

-------------------------------- Message 1 -------------------------------

Date:  Tue, 19 Oct 1999 21:57:8 +0800
From:  Smiley <smiley at sh163a.sta.net.cn>
Subject:  History of Corpora

Dear all,

Does anyone have or know of sources for information on the history of
corpora either for dictionary-making or for lingustic pursuit?

Thanks.

Gao Yongwei
Fudan University,
Shanghai, China


-------------------------------- Message 2 -------------------------------

Date:  Tue, 19 Oct 1999 11:21:24 -0700
From:  "James Giangola" <jamesg at Nuance.COM>
Subject:  Terminology for Y2K and beyond

Has anyone out there done any research on how people will say (more exactly,
will expect to hear) years such as 2001, 2005, 2010, 2015, 2020, 2037, etc.?

I am working on a speech recognition/synthesis application that needs to
"speak back" to the user certain years beyond 2000.  Here are the top
candidates:

(1) 	"two thousand" not followed by "and", e.g. two thousand five, two
thousand thirty-seven
(2)	"two thousand and...", e.g. two thousand AND five, two thousand AND
thirty-seven
(3)	"twenty...", e.g. twenty oh five, twenty thirty-seven

To my ear, it seems that the further into the future the year is, the better
way (3) sounds, e.g. "twenty thirty-seven", instead of "two thousand (and)
thirty-seven".

What about 2001?  Will people want to say this date as in the movie title?
Should it be "two thousand one" or "two thousand AND one"?  Although I'm a
native speaker of English, I can't make up my mind, and folks here at work
don't agree on this issue.

My own hunch is that people will resort to the shortest way possible, way
(3), but this isn't based on any serious study.

If anyone has done any sort of survey on this topic, your help would be much
appreciated!

Thanks,

James Giangola
Software Engineer, Dialog Research & Design
jamesg at nuance.com
Nuance Communications
1380 Willow Rd.
Menlo Park, CA 94025
www.nuance.com


---------------------------------------------------------------------------
LINGUIST List: Vol-10-1572



More information about the LINGUIST mailing list