Odp: Corpora: Collaborative venture

Tadeusz Piotrowski tadpiotr at ii.uni.wroc.pl
Thu Jun 15 10:58:57 UTC 2000


Well, linguistics is certainly loads of fun, that is why I am doing it!!
Lexicography is even better...

That discussion has been one of the most informative and, err, amusing. The
river is meandering a lot... Thanks a lot!
I think the idea is great, and I am trying to persuade people in Poland to
do the same, because corpora are thinly distributed here, and nobody is
willing to share their precious collections. This way we can have a nice
corpus.
I am afraid, though, that the conclusion will be that linguistics is such
fun...

By the way, people who are and were at Cobuild and who worked hard on
development of the unique format: you might be interested to know that there
is a dictionary of Polish now that tries to do exactly what you did, and it
uses a VERY similar format of description:
Inny slownik jezyka polskiego, Warszawa 2000, PWN, ed in chief Miroslaw
Banko, in two volumes.
Tries, as the headword list is a compilation from previous dictionaries,
rather than derived from frequency lists.
Quite interesting...
Apologies for this aside.

Regards

Tadeusz Piotrowski
***************************************************************
                                              mailing address
Department of English
Opole University                    Zielinskiego 47/11
Oleska 48                              PL-53-533 Wroclaw
Opole
POLAND
              phone/fax (+48)71-3382664


----- Original Message -----
From: Jem Clear <jem at cobuild.collins.co.uk>
To: <corpora at hd.uib.no>
Sent: Tuesday, June 13, 2000 1:23 PM
Subject: Corpora: Collaborative venture


> Re: the points raised by Eric Atwell (et al.) (see snippet below).
>
>
> > >I agreed if the sense tags have completely different meaning. However,
> > >the differences in meaning between tags may be in shades of meaning
> > >rather than the crisp decision that they are or not same....
>
> > ... I don't believe there is a clear, "self-evident" set of semantic
> > tags. Semantic tagging could instead aim to annotate each word with
> > a SET of semantic features, and "disambiguation" could aim to
> > eliminate sematic features incompatible with context; this would
> > allow for overlap and indeterminate sense-tagging. The set of
> > semantic features for a word could be a bundle of semantic
> > information, for example the lemma/root, subject-category code,
> > selection restrictions, and meaning definition from LDOCE; instead
> > of sense-tagging, if the aim was to eliminate features which were
> > incompatible with context, you should get more inter-annotator
> > agreement.
>
>
> Oh dear! No, no, no. OK. Maybe I was being a little naive in
> thinking that a large group of corpus linguists could even begin
> to agree on a simple, but potentially useful, collaborative
> scheme. A project in "semantic tagging" seems to my way of
> thinking precisely what we do *not* need -- or rather we have
> plenty of such projects going on at the moment anyway so there's
> no widespread benefit to the linguistic community in having
> a few more people sitting round discussing what exactly *are*
> the set of primitive semantic components or how a semantic "entry"
> should be structured or whatever.
>
> I was feeling reckless last Friday afternoon so thought I'd float
> an extremely simple idea based on the assumption that speakers
> of English (native or non-native) have some ability to pick from
> a number of offered citations those which in their opinion match
> a given dictionary definition. I am not so foolish as to believe
>
> a) that all respondents would select the same citations if offered the
> same source set (this is the Consensus Issue)
>
> b) that the dictionary definition is "true" or "correct" or clearly
defines
> the boundaries of a word sense (this is the Which Tagset? Issue)
>
> c) that all citations selected by respondents would be "correct" (this
> is the Quality Control Issue: aka the Noise Problem)
>
> Suppose in primitive times, when the only routes connecting towns and
> villages were rough, muddy tracks, that someone proposes that the
> community build a road by bringing bucketloads of rubble, stones, ash,
> whatever and pack it down to make a hard flat surface. As soon as this
> idea is proposed, one group of villagers get very excited because
> no-one has told them how wide the proposed road should be (just wide
> enough for one cart -- or wide enough for two carts to pass?). A wise
> man from another town questions whether straw should be added to the
> stones being thrown down -- straw may disintegrate and not last
> through winter rains. Others get into fierce arguments about whether
> the road should go straight from one village to another or should wind
> around avoiding hills, deep valleys, marshland, etc.
>
> You get the idea! Just a few people bring along a few bucketloads of
> stones and rubble and the road extends for no more than 5 metres,
> despite the fact that almost everyone agrees that a road of some sort
> would be much better than the rutted, filthy, muddy track along which
> they have to walk, ride, or drive their livestock.
>
> Linguistics is such fun, isn't it
>
> Jem Clear
>
> Electronic Development Director     phone:  +44 (0)121-414-3926
> Collins Dictionaries                  fax:  +44 (0)121-414-6203
> Westmere, 50 Edgbaston Park Road    email: jem at cobuild.collins.co.uk
> Birmingham, B15 2RX, UK               WWW: www.cobuild.collins.co.uk
>
>



More information about the Corpora mailing list