15.3305, FYI: Software Localization; New BNC-related Corpus

LINGUIST List linguist at linguistlist.org
Thu Nov 25 19:15:38 UTC 2004


LINGUIST List: Vol-15-3305. Thu Nov 25 2004. ISSN: 1068 - 4875.

Subject: 15.3305, FYI: Software Localization; New BNC-related Corpus

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Collberg, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Ann Sawyer <sawyer at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================

1)
Date: 25-Nov-2004
From: Donald Osborn < dzo at bisharat.net >
Subject: Issues in Software Localization

2)
Date: 22-Nov-2004
From: Mark Davies < mark_davies at byu.edu >
Subject: New BNC-related Corpus: Register-based Queries



-------------------------Message 1 ----------------------------------
Date: Thu, 25 Nov 2004 14:10:18
From: Donald Osborn < dzo at bisharat.net >
Subject: Issues in Software Localization


The localization of internet content and computer software to many
languages is an undeniable trend. With regard to software localization,
the translation of commands, interfaces, glossaries and documentation
into diverse languages raises interesting language questions as well as
the need for collaboration between linguists and software localizers.

In a recent event for outlining and planning for localization needs,
the Localisation Development Sprint (Warsaw, 11/20-22/04), language
and the specialty of linguistics were subjects often more implicit than
explicit.  Nevertheless that event may be of interest, not only for the
ground it covered but also for links to other activities.

See http://localisationdev.org/

Don Osborn
Bisharat.net



Linguistic Field(s): Computational Linguistics; General Linguistics




-------------------------Message 2 ----------------------------------
Date: Thu, 25 Nov 2004 14:10:19
From: Mark Davies < mark_davies at byu.edu >
Subject: New BNC-related Corpus: Register-based Queries



I have placed on the web a freely-accessible resource that may be
of interest to some of you:

http://view.byu.edu
([V]ariation [I]n [E]nglish [W]ords and Phrases)

As with some other interfaces, this website allows you to quickly
and easily search the 100 million word British National Corpus.
Users can search by exact word or phrase, wildcard or part of speech,
or combinations of these (e.g. all nouns ending in -ness or all cases
of 'white' + [noun]).

Unlike some interfaces that are strictly 'slot-oriented', this interface
 also allows you to use 'anchors' and 'targets' for fuzzy matches
(e.g all nouns somewhere near 'break' (v), adjectives near 'woman',
verbs near 'way', and nouns near 'small'), and the size of the window
can be easily customized.

Perhaps the most unique aspect of the corpus is the ability to find
 the frequency of words and phrases in any combination of registers
that you define (spoken, academic, poetry, medical, etc).  In addition,
you can compare between registers -- for example, verbs that are
more common in legal or medical texts, phrases like [I * that] that
are more common in conversation than in non-fiction texts, nouns
near 'break' (v) that are found primarily in academic writings, etc.

Finally, it should be noted that the database architecture of this
corpus improves on some previous interfaces, in that it allows
users to find *all* of the matching strings from the BNC, rather
than just those n-grams that occur three times or more in the
corpus (which effectively cuts out about 75% of all 2-gram and
3-gram strings).  It's also quite fast -- just a couple of seconds
or less for nearly all searches -- including queries with detailed
register information.

If you have any questions, please feel free to email me.

Mark Davies
Assoc. Prof., Linguistics
Brigham Young University
http://davies-linguistics.byu.edu

** Corpus design and use // Web-database scripting **
** Historical linguistics // Functional-typological grammar **
** Variation in Spanish, Portuguese, and English syntax **




Linguistic Field(s): Applied Linguistics; Computational Linguistics; Discourse
Analysis; Lexicography; Text/Corpus Linguistics












-----------------------------------------------------------
LINGUIST List: Vol-15-3305





More information about the LINGUIST mailing list