6.755, Qs: Terminology system, Dictionary, Parallel corpora

The Linguist List linguist at tam2000.tamu.edu
Thu Jun 1 16:24:55 UTC 1995


----------------------------------------------------------------------
LINGUIST List:  Vol-6-755. Thu 01 Jun 1995. ISSN: 1068-4875. Lines:
 
Subject: 6.755, Qs: Terminology system, Dictionary, Parallel corpora
 
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at tam2000.tamu.edu>
            Helen Dry: Eastern Michigan U. <hdry at emunix.emich.edu>
 
Asst. Editors: Ron Reck <rreck at emunix.emich.edu>
               Ann Dizdar <dizdar at tam2000.tamu.edu>
               Ljuba Veselinova <lveselin at emunix.emich.edu>
               Annemarie Valdez <avaldez at emunix.emich.edu>
 
                           REMINDER
[We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then  strongly encouraged to post a summary to the list.   This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.]
 
-------------------------Directory-------------------------------------
 
1)
Date:          Mon, 8 May 1995 10:28:41 +0000
From: "Milde Jordaan-Weiss 012 314-6165" (VT13 at acts2.pwv.gov.za)
Subject:       REQUEST FOR INFORMATION
 
2)
Date:   Sun, 28 May 1995 21:22:18 -1000
From: Phil Bralich (bralich at uhunix.uhcc.Hawaii.Edu)
Subject: Machine Usable Dictionary
 
3)
Date: Tue, 30 May 1995 14:23:14 +0200
From: Maria Gavrilidou ILSP (M3653 at eurokom.ie)
Subject: Parallel corpora
 
 
-------------------------Messages--------------------------------------
1)
Date:          Mon, 8 May 1995 10:28:41 +0000
From: "Milde Jordaan-Weiss 012 314-6165" (VT13 at acts2.pwv.gov.za)
Subject:       REQUEST FOR INFORMATION
 
Dear colleague
 
The National Terminology Services (NTS) of South Africa is looking
for a Terminology Management System which will be able to accommodate
all 11 official languages. Some of the African Languages have special
diacritics not yet available in commercial software.
 
Attached you will find the RFI from the NTS. Please pass it on to anyone who
might be interested. The rather bulky USER REQUIREMENT SPECIFICATION
will be e-mailed to interested parties as soon as they request it.
Please take note of the closing date of 29 May.
 
We appreciate your help in this matter.
 
Yours sincerely
 
Ms Milde Jordaan-Weiss
Ms Milde Jordaan-Weiss
National Terminology Services
Department of Arts, Culture, Science and Technology
Private Bag X894
0001 PRETORIA
REPUBLIC OF SOUTH AFRICA
 
Tel +27 12 314-6165
Fax +27 12 325-4943
 
--------------------------------------------------------------------------
2)
Date:   Sun, 28 May 1995 21:22:18 -1000
From: Phil Bralich (bralich at uhunix.uhcc.Hawaii.Edu)
Subject: Machine Usable Dictionary
 
As you may know from postings I have made to this list over the last
couple of months, Derek Bickerton and I are developing a parser
based on a theory of syntax that he and I have been developing over
the last four years.  We are about to purchase a machine usable
dictionary with approximately 70,000 entries for $2500.  If anyone
could advise us whether or not that is our best bet, or where we might
find other dictionaries, we would appreciate hearing from you.
 
We are currently working with a dictionary of under 1000 words, so it
is imperative that we obtain a larger one, so we may begin working
with larger corpora.  Toward that end we would also like to find out
which texts were used in past parsing competitions and where the
results of these competitions are published.  We believe that with a
few weeks of work we should be able to modify a dictionary
sufficiently to allow us to begin experinmenting with texts that were
used in past parsing competitions.
 
Here are the specs the parser.  It is based on a series of algorithms that
have been four years in the making, but the programming required to
create this parser has only taken 300 hours using C++ .  There
areapproximately 3000 lines of code that take up 150k executable on
disk.  About 100k of RAM is required to run the parser.  30k on disk is
required for a 300 word dictionary.   An average sentence takes under
4 seconds to process on a 486 IBM compatible.  Since this is only a
development version, we expect these numbers to change.  To date, no
optimizations have occurred, and we expect to significantly shrink the
dictionary disk usage and the execution time.
 
Phil Bralich
bralich at uhccux.uhcc.Hawaii.edu
 
--------------------------------------------------------------------------
3)
Date: Tue, 30 May 1995 14:23:14 +0200
From: Maria Gavrilidou ILSP (M3653 at eurokom.ie)
Subject: Parallel corpora
 
Content-Length: 2194
 
Dear linguists,
 
A short time ago, I posted to the list a query on parallel corpora.
Since answers are still comming in, I will not give a summary of the
answers at this point. (however, a summary will be given as soon as
I have gathered all answers).
 
Due to e-mail problems, I believe some e-mail messages must have been
lost. So, I give here below the list of the people whose messages
I have received. If you have written me and your name is not included
here, please re-send your answer to my personal address! I also repeat
here the original query for those who have not already seen it.
 
List of addresses of people who have answered :
kemmer at ruf.rice.edu
barlow at ruf.rice.edu
Bert.Peeters at modlang.utas.edu.au
estival at divsun.unige.ch
R.M.Salkie at bton.ac.uk
BERNARD at ccnet.up.ac.za
macrakis at asf.org
ingria at bbn.com
 
The original message is the following:
 
) Dear linguists,
)
) I am involved in a project concerning parallel text-corpora, and
) I would like to know if anybody has already had any experience on
) the matter. Specifically, I would like to know if there already
) are any efforts ongoing (or completed!) about specs for parallel
) corpora, for representation issues, text typology etc.
)
) If anybody has the time to answer my query I would greatly appreciate
) it! Please reply to my personal address.
 
Sorry to those who have seen this message again!
 
Thank you all,
Maria Gavrilidou
Institute for Language and Speech Processing
Athens, Greece
 
--------------------------------------------------------------------------
LINGUIST List: Vol-6-755.



More information about the LINGUIST mailing list