Corpora: Advances in Minority Language NLP / MT Applications

MIT2USA at aol.com MIT2USA at aol.com
Sat May 13 22:58:49 UTC 2000


MIT2 software solutions for preparing Creole languages for porting
to popular off-the-shelf computer applications and embedded MT
systems demonstrated in Seattle

Marilyn Mason, CEO of Mason Integrated Technologies Ltd (MIT2),
demonstrated a research prototype of its proprietary orthography
conversion software for sparse data languages at both the 3rd
International Controlled Language Applications Workshop
(CLAW2000) and the Language Technology Joint Conference for
Applied Natural Language Processing and the North American
Chapter of the Association for Computational Linguistics
(ANLP-NAACL2000), held in Seattle, WA April 29 to May 4, 2000.

The Creole version of this conversion software is being prepared for
market as CreoleConvert(tm). Paired with CreoleScan(tm), MIT2's
roprietary optical character recognition (OCR) solution, these tools
serve as a prototype for an electronic corpus entry and corpus
cleansing workflow process for languages having a large incidence
of lexical and orthographical variation.

These processes constitute an essential "middleware" task for
preparing sparse data / minority languages for porting to other
language technology tools, such as spell checkers, machine
translators, speech-to-text and text-to-speech applications, etc.

MIT2 intends to act as a coordinating agent to enable linguists,
end user native speakers, language development leaders, and
corpora builders to coordinate their activites and to conform to
established protocols, formats and conventions for data tagging,
so that these precious electronic materials can not only meet
the short-term goal of providing for a standardized literature
base, but be re-used to serve as the very building blocks for
development of future language technology tools for these
languages.

As orthographic and lexical standardization are the base elements
for spell checking, authoring, and translation tools, this technology
is now being further developed in-house by MIT2 in order to provide
minority languages with consistent and coherent standardization
strategies for the optimization of authoring and translation tasks.

This novel "middleware" approach to porting languages which have
thus far "missed out on most of the benefits of the Electronic Age"
stirred considerable interest among representatives of some of the
biggest players in NLP and MT systems development, who were
also in attendance at CLAW2000 and ANLP-NAACL2000.

These processes will be further described and demonstrated at the
2nd international Language Resources and Evaluation Conference
(LREC2000) and the LREC2000 Workshop on "Developing language
resources for minority languages: re-useability and strategic
priorities" to take place 29 May - 2 June 2000 in Athens, Greece.
Ms. Mason will deliver the papers "Issues from corpus analysis that
have influenced the on-going development of various Haitian Creole
text- and speech-based NLP systems" and "The State of the Art of
French Creole Language Resource Engineering".

Located in Boston, Massachusetts (USA), MIT2 fosters research
and development activity on behalf of French-, Portuguese-, and
English-related Creoles, as well as other minority and vernacular
languages, and is actively seeking corporate investment capital
and corporate strategic partnering relationships.

For more information, please contact:
Mason Integrated Technologies Ltd (MIT2)
P.O. Box 181015, Boston, Massachusetts 02118 USA
Tel: (+1) 617-247-8885, Fax: (+1) 617-262-8923
E-mail: mit2usa at aol.com
MIT2 Web Page: http://hometown.aol.com/mit2usa/Index2.html

*******
Mason Integrated Technologies Ltd
P.O. Box 181015
Boston, MA  02118  USA
(617) 247-8885 (office & answering machine)
(617) 262-8923 (FAX)
MIT2USA at aol.com (e-mail)
Mason Integrated Technologies Ltd Home Page:
   http://hometown.aol.com/mit2usa/Index2.html
MIT2 President's Update:
    http://hometown.aol.com/mit2usa/Update3-2000.htm
Introducing CreoleScan(tm) and CreoleConvert(tm):
    http://hometown.aol.com/mit2usa/IntroCrScCrConv.htm
Orthographically Converted HC Texts Download Site:
   http://hometown.aol.com/mit2haiti/Index4.html
Meet Marilyn Mason:
   http://hometown.aol.com/marilinc/Index3.html
MIT2 Job Opportunities
   http://hometown.aol.com/mit2usa/JobOpps.html



More information about the Corpora mailing list