[Corpora-List] Anything resembling TPC benchmarks for corpora?
David L. Hoover
david.hoover at nyu.edu
Thu Jul 12 12:55:11 UTC 2012
As a member of the International Advisory Panel of CLARIN, I can confirm
that CLARIN is very vigorously pursuing standards in all areas. All
funded projects are required to address standards issues and either to
conform to existing standards or propose extensions.
You might have a look at
http://www.clarin.eu/external/index.php?page=activities&sub=1
and various reports here:
http://www.clarin.eu/external/index.php?page=publications&sub=5
I don't know enough about the precise issues you're discussing to say
whether the standards already in place at CLARIN are adequate for your
purposes.
David
On 7/12/2012 5:11 AM, Alon Lischinsky wrote:
> Adam Kilgarriff <adam at lexmasterclass.com> wrote:
>
>> I really can't swallow the analogy with DBMS. That's technology, (corpus)
>> linguistics is science. There, the task is getting your house in order and
>> singing from the same hymnsheet. Here, the big picture is that we are
>> trying to find out how language works.
> Wholeheartedly agreed. But this doesn't mean that some of the
> technological problems involved in storing, annotating and querying
> corpora cannot be solved in a theory-agnostic manner.
>
> We may not agree on what exactly our markup will contain, but at least
> we should be able to settle on a common markup scheme, and so avoid
> problems like, say, someone's concordancer choking on someone else's
> POS tags. XML is unfortunately not good enough, as it requires proper
> nesting. And at present things are a mess, with a multitude of
> standards making it very hard to explore the same corpus with a
> variety of tools.
>
> At a recent workshop, Martin Wynne mentioned the CLARIN project
> (http://www.clarin.eu/external/index.php?page=about-clarin), which
> seems a great step in the right way, but I'm not sure how complete
> their set of standards is at present.
>
> A.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list