8.1670, FYI: SEMCOM: Rav-Milim Project, NorFa Summer School
The LINGUIST List
linguist at linguistlist.org
Sat Nov 22 00:02:36 UTC 1997
LINGUIST List: Vol-8-1670. Sat Nov 22 1997. ISSN: 1068-4875.
Subject: 8.1670, FYI: SEMCOM: Rav-Milim Project, NorFa Summer School
Moderators: Anthony Rodrigues Aristar: Texas A&M U. <aristar at linguistlist.org>
Helen Dry: Eastern Michigan U. <hdry at linguistlist.org>
T. Daniel Seely: Eastern Michigan U. <seely at linguistlist.org>
Review Editor: Andrew Carnie <carnie at linguistlist.org>
Associate Editor: Ljuba Veselinova <ljuba at linguistlist.org>
Assistant Editors: Martin Jacobsen <marty at linguistlist.org>
Brett Churchill <brett at linguistlist.org>
Anita Huang <anita at linguistlist.org>
Julie Wilson <julie at linguistlist.org>
Elaine Halleck <elaine at linguistlist.org>
Software development: John H. Remmers <remmers at emunix.emich.edu>
Zhiping Zheng <zzheng at online.emich.edu>
Home Page: http://linguistlist.org/
Editor for this issue: Martin Jacobsen <marty at linguistlist.org>
=================================Directory=================================
1)
Date: Fri, 14 Nov 1997 10:04:20 -0800 (PST)
From: alan harris <vcspc005 at email.csun.edu>
Subject: SEMCOM: Rav-Milim Project (fwd)
2)
Date: Thu, 20 Nov 1997 15:30:52
From: Juhani Jarvikivi <Juhani.Jarvikivi at joensuu.fi>
Subject: NorFa Summer School: Languages, Minds and Brains
-------------------------------- Message 1 -------------------------------
Date: Fri, 14 Nov 1997 10:04:20 -0800 (PST)
From: alan harris <vcspc005 at email.csun.edu>
Subject: SEMCOM: Rav-Milim Project (fwd)
SEMCOM
Online bulletin of the Commission on Semiotics and Communication,
National Communication Association// [If you would like to be included
in the SEMCOM list, please reply or send a note to
alan.harris at csun.edu with the command, "add SEMCOM", in the body.
==============================================================
Alan C. Harris, Ph. D. TELNOS: main off: 818-677-2853
Professor, Communication/Linguistics direct off: 818-677-2874
Speech Communication Department
California State University, Northridge home: 818-366-3165
SPCH CSUN FAX: 818-677-2663
Northridge, CA 91330-8257 INTERNET email: ALAN.HARRIS at CSUN.EDU
WWW homepage: http://www.csun.edu/~vcspc005
===============================================================
From: Humanist Discussion Group <humanist at kcl.ac.uk>
"Rav-Milim" (Multi-Words)
A Computerized Infrastructure for
Intelligent Processing of Modern Hebrew
Principal Investigator: Yaacov Choueka
(Highlights)
"Rav-Milim" is a broad, comprehensive, robust and integrated
computerized infrastructure for the intelligent processing of modern
Hebrew, developed in the years 1989-1996 at the Center for Educational
Technology in Tel-Aviv. Large teams of programmers. linguists,
computational linguists, lexicographers and editors were involved in
this project, which was initiated, directed and supervised by
Prof. Y. Choueka from Bar-Ilan University. Yoni Ne'eman was in charge
of the linguistic algorithms as well as chief programmer of the
project. The names of some of the other major team members are given
at the end. A few papers on the system and its various components are
now under preparation.
The basic modules of the system, from which scores of products and
applications (both computerized and printed) have been derived, are as
follows:
- "Milim": A complete, accurate, comprehensive and portable
morphological analyzer and lemmatizer for modern Hebrew (there is an
estimated 70 million of word-forms in Hebrew). The program takes as
input any word (string of characters) in Hebrew and outputs the
set of all its (linguistically correct) grammatical analyses,
including: root, dictionary entry, part-of-speech, gender-number for
nouns and adjectives, mode-tense-person-gender-number for verbs,
attached prepositions, attached pronouns (including
person-gender-number of the pronoun), and more.
Milim recognizes all common modes of Hebrew spelling
(defective - "hasser" and plene - "male") and also
some extra-linguistic units such as acronyms (abundant in Hebrew),
abbreviations, and frequent proper nouns (of persons, places, products).
The program, a library of subroutines in C, takes a few hundred K's,
and can analyze about 1,000 words per second on a Pentium PC.
- "Katvan" (spelling checker): Unlike English, an adequate
spelling checker for Hebrew can not consist of long lists of
words with some rudimentary suffix stripping, and has to be based on a
morphological analyzer. Katvan is an accurate and
comprehensive Hebrew spelling-checker based on Milim, that recognizes
both the "defective" and "plene" spellings, and can correctly convert
from one mode to the other (it also suggests corrections to flawed
strings). Katvan was chosen by Microsoft and Word Perfect to be
the standard spelling-checker for their Hebrew word-processors.
- "Nakdan" (Vocalizer): A program that, given a word-form and its
grammatical analysis, will output its (unique) vocalization
(including long and short vowels, stresses, etc.) according to
the rules of grammatical Hebrew vocalization. Given any word in Hebrew
(without context), the program will activate "Milim" to get all its
possible morphological analyses, and will attach to each of
them the appropriate vocalization, thus producing as output the set of
all (linguistically correct, context-free) possible vocalizations of
that word.
- "Nakdan-Text" (Text Vocalizer): Given a sentence in Hebrew,
this program will vocalize it, by first activating "Nakdan" to find
all possible morphological analyses and attached vocalizations of
every word in the sentence, then choosing, for every such word,
the "correct" context-dependent one, using short-context syntactical
rules as well as some probablistic and statistical modules.
The program works with a 95% accuracy, and is available, e.g., as an
on-the-shelf add-on to Microsoft Hebrew Word.
After installation, any Word document (or even book), can be
vocalized by just marking it and clicking on the pertinent icon;
the vocalization is done online, and the document can be printed
with the diacritic vocalization points on any (Word-supported)
printer. Proofreading and correcting the erroneous vocalizations
are very easy and do not require a professional linguist (as is the
case generally with manual vocalization).
Nakdan-Text is an essential step for Text-to-Speech applications in
Hebrew; without such vocalization, computerized "reading" is
obviously impossible.
- "Hamilon" (The Dictionary): A new dictionary of Hebrew,
built a-priori on modern lexicographical principles and with an
architecture that is easy to use and embed in computerized processing
contexts. Radically different in philosophy and approach from the
available classical dictionaries of Hebrew, the Rav-Milim dictionary is
synchronic (rather than historical), descriptive (rather than
normative, although bad usage is clearly tagged as such),
comprehensive - covering all registers of the language (from the
literary to the slang and vulgar) and all strata (from the biblical
to the modern) - but not exhaustive (omitting historical curiosities,
discarded inventions, etc) and user-oriented. Following the new
sensitivity to meaning-in-context acquired by the extensive
processing of large corpora, the full and rich spectrum of the
different meanings of an entry is deployed, and usage examples
for every (non-encyclopoedic) entry, carefully designed to highlight
its appropriate sociolinguistic context, are given. For each entry,
the family of its related terms (words with the same root and the
same semantic field) is detailed. Special attention is given to
collocations (a generic term used here loosely for compound nouns,
verbal attachements, fixed phrases, idioms, etc, that deserve a
special dictionary heading and explanation): every collocation
appears under each of its pertinent entries, and some 8,000
new collocations (out of a total of 20,000), never recorded before,
are explained.
The printed version of the dictionary was published in April 1997
(by C.E.T., Steimatzky and Miskal) as a 6-volume set, and the
computerized version appeared at about the same time, as part of
"The Hebrew Language CD", described below.
- "The Hebrew Language CD": All of the grammatical and lexicographic
modules described above, and more, are integrated in this CD-ROM,
which is in fact a complete "laboratory" of Hebrew processing (on the
word level). Keying any word, the user can spell-check it or ask for
its (correct) spelling in the different modes, see its vocalization(s)
and its decomposition into meaningful components, look at its complete
morphological analysis (or analyses), see the full family (in the
sense defined above) of related terms, review all collocations that
contain it (there may be hundreds of them) - and for each one that he
marks, read its explanation - , ask for the full conjugation table of
the corresponding base-form (in both vocalized and non-vocalized forms
and spellings), ask for all entries that have the same vocalization
pattern, and, of course, ask to see the full dictionary record of the
appropriate entry. It should be noted here that looking for a word in
a printed Hebrew dictionary can be a frustrating experience even for
experienced users, since one has first to reduce the word, in the form
encountered, to its base-form (or its root), a task that is not needed
here. The user enters the word in any variant encountered, and the
program will automatically display the pertinent entry (or, sometimes,
entries). This feature also allows the user to mark any string in an
explanation or a usage-example, and the appropriate entry and
explanations will be displayed, ad infinitum.
- "Young Rav-Milim - The Dictionary": A dictionary of modern Hebrew (2
vols, 1,000 pgs, same publishers as above) for the young (ages 7-16),
with (1000, color) illustrations (the first of its kind ever in
Hebrew). All of the dictionary contents (entries and subentries,
collocations, explanations, usage examples, etc) reflect the young
world of knowledge and associations. A unique feature of the
dictionary is the thousands of annotations scattered in it, giving the
reader a wealth of additional interesting information on
morphological, grammatical, semantical, historical and cultural
aspects of the entry. The page layout is reminiscent of a Talmudic
page: a rectangular box of basic text, surrounded by related
glossaries, commentaries and notes. The dictionary thus functions as
an attractive book to read and browse into, in addition to its basic
function as a reference book.
- "Young Rav-Milim - The Multimedia CD-ROM": A multimedia version of
the dictionary, that reflects the whole contents of the printed one,
and, in addition, pre-taped pronounciation of the entries, typical
sounds for appropriate entries (animals, musical instruments, special
verbs, etc), linguistic and "dictionary" games, etc.
Rav-Milim Team (major participants):
- ----------------------------------
Yaacov Choueka, PI and Director
Yoni Ne'eman, Chief programmer and in charge of linguisitic
algorithms
Programmers: Avi Danon, Yosi Sarousi
Linguistics: Rahel Finkel, Hagit Avioz
The Dictionary:
Steering Committee:
Prof. Yacov Choueka, Prof. M.Z. Kaddari (Vice-President, Academy of
Hebrew Language), Prof. R. Nir (Hebrew University), Prof. R. Mirkin
(Academy of Hebrew Language), Prof. O.Schwarzwald (Bar-Ilan
University), M. Zinger.
Editor-in-Chief: Uzzi Freidkin
Senior Editors: Dr Haym Cohen, Yael Zachi-Yannai
Science and Technology Editor: Yakhin Unna
Assistant Editors: Rahel Finkel, Hagit Avioz, Sara Choueka
Dictionary for the Young:
Steering Committee:
Prof. R. Berman (Tel-Aviv University), Dr. Zvia Walden (Berl College),
Prof R. Nir, Dr. Dorit Ravid, Prof. Maya Fruchtman,
Prof. O. Schwarzwald
Editor: Yael Zachi-Yannai
Assistant Editors: Hagit Avioz, Sara Choueka
Consultants: Uzzi Freidkin (lexicography), Dr. Haym Cohen
(linguistics), Dr Zvia Walden (Educational approach and design).
Multimedia version:
Design and supervision: Ofra Razel
- ----------------------------------------------------------------
Humanist Discussion Group
Information at <http://www.kcl.ac.uk/humanities/cch/humanist/>
<http://www.princeton.edu/~mccarty/humanist/>
-------------------------------- Message 2 -------------------------------
Date: Thu, 20 Nov 1997 15:30:52
From: Juhani Jarvikivi <Juhani.Jarvikivi at joensuu.fi>
Subject: NorFa Summer School: Languages, Minds and Brains
First Circular
November 1997
The Department of Linguistics of the University of Joensuu and the
Nordic Neurolinguistic Network are pleased to announce that a Nordic
Research Course, sponsored by the Nordic Academy for Advanced Study
(NorFA), called
Languages, Minds, and Brains
will be held at the Mekrijarvi Research Station, University of
Joensuu, Ilomantsi, Finland, June 22-29, 1998.
The Course will consist of the following three components. The
components are planned to be joint sessions involving all the
participants, students as well as teachers, of the Research
Course. This policy is taken in order to maximize the
multidisciplinary flow of ideas between the participants.
(a) Four-hour survey lectures by internationally well-known experts
Dr. Harald Baayen (Max Planck Institute for Psycholinguistics,
Nijmegen): Morphological and Lexical Processes and Representations
Prof. Kenneth Hugdahl (Biological and Medical Psychology,
Bergen): Neuroimaging and the Brain
Prof. Lise Menn (Linguistics, Boulder): Methodological Issues
in the Case Study Approach
Prof. Michel Paradis (Linguistics, McGill): Grammar,
Pragmatics, and the Brain
(b) Seminars with 30 minute individual presentations by the students
and 30 minute post-paper discussions. The seminars will be attended by
all the teachers.
(c) Discussion sessions towards the end of a topic area highlighting
on the methodological and theoretical issues shared by the papers
presented.
The criteria for student selection in addition to those defined by
NorFA (in regards to country of origin, etc.):
(a) The participants should have a strong background in one or several
of the following disciplines or related areas: linguistics,
psychology, neurology, cognitive science, phonetics, logopaedics and
special education.
(b) The topic of the Course (language, mind, and brain) should occupy
a significant position in the PhD or post-doctorate studies or study
plans of the participants.
The number of student participants will be restricted to 25.
Pre-course Requirements in Addition to the General NorFA Requirements:
(a) The applicants should send, together with their
application, a 3-5 page long abstract of their work (planned or
ongoing) in the topic area(s) of the Course. The texts of the accepted
students will eventually be mailed well in advance to the teachers as
well as to the other student participants.
(b) It is expected that the invited teachers or the organizers
will require a set of pre-course readings. A list of the required
pre-reading material will be sent to the participants well in advance.
NorFA will pay for the tuition as well as for board and lodging during
the course and for travel as follows. For students originating from
Denmark, Iceland and Norway, NorFA will cover the (APEX-type) return
flight tickets from the port of exit (e.g. Copenhagen) to
Helsinki. The Swedish students will receive the boat fares between
Stockholm and Helsinki/Turku from the organizers. Within Finland, only
general-public surface travel (i.e., train, bus) tickets will be paid
for by the organizers.
Accommodation at the Research Station will be in double rooms.
Our web site (Linguistics under http://www.joensuu.fi/fld) will
contain e.g. the program. Please visit us there for more information
or contact the responsible organizer directly.
Application procedure: Send a free-form application to Jussi Niemi
(below) by March 1, 1998. Please enclose a brief CV and a 3 to 5 page
summary of your research interests.
Those accepted will be notified by April 1.
Responsible Organizer: Jussi Niemi, Associate Prof.,
Linguistics, University of Joensuu, FIN-80101 Joensuu, Finland,
jussi.niemi at joensuu.fi, fax +358-13-251 4211, phone +358-13-251 4306
---------------------------------------------------------------------------
LINGUIST List: Vol-8-1670
More information about the LINGUIST
mailing list