[Corpora-List] foreign words in German

chris brew cbrew at acm.org
Wed Sep 28 13:10:55 UTC 2011


This is all ridiculously hard. There are not many words in any
language which are solidly native.

The following extract from Beatrice Santorini's document on how to tag
the Penn Treebank is an example of a sane way of
dealing with the issue

========================================================================================

Foreign word FW

Use your judgment as to what is a foreign word. For me, yoga is an NN,
while bete noire and persona non
grata should be tagged bete/F W noire/F W and persona/FW non/F W
grata/F W, respectively.

========================================================================================

This is a sane approach, because it explicitly tells annotators what
to do, and makes no claim to principled wisdom.
 but it is not a recipe for good inter-annotator agreement or
easy resolution of differences of opinion.

I bet that I can generate disagreements:

- Is "yogi"  a foreign word in English? How about "sanyassin",
"bhikkhuni",  or "sangha"? (last three definitely from Sanskrit or
Pali, last two very technical, but widely used
  among english-speaking Buddhists)
- How about "census","agenda" and "aliquot"? (all from Latin, last one
has funny looking spelling, for sure)

This notion of "foreign word" is absolutely not sustainable.



On Tue, Sep 27, 2011 at 9:52 AM, John F. Sowa <sowa at bestweb.net> wrote:
> For borrowings from English, a question comes up about the date
> of the borrowing.  The T in 'Boot' indicates a borrowing from
> a northern dialect -- probably from the Anglo-Saxons who used
> boats to go to England.
>
> John
>



-- 
Chris Brew, Educational Testing Service

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list