[Corpora-List] Anything resembling TPC benchmarks for corpora?

David L. Hoover david.hoover at nyu.edu
Thu Jul 12 12:55:11 UTC 2012


As a member of the International Advisory Panel of CLARIN, I can confirm 
that CLARIN is very vigorously pursuing standards in all areas. All 
funded projects are required to address standards issues and either to 
conform to existing standards or propose extensions.

You might have a look at
http://www.clarin.eu/external/index.php?page=activities&sub=1

and various reports here:
http://www.clarin.eu/external/index.php?page=publications&sub=5

I don't know enough about the precise issues you're discussing to say 
whether the standards already in place at CLARIN are adequate for your 
purposes.

David

On 7/12/2012 5:11 AM, Alon Lischinsky wrote:
> Adam Kilgarriff <adam at lexmasterclass.com> wrote:
>
>> I really can't swallow the analogy with DBMS.  That's technology, (corpus)
>> linguistics is science.  There, the task is getting your house in order and
>> singing from the same hymnsheet.  Here, the big picture is that we are
>> trying to find out how language works.
> Wholeheartedly agreed. But this doesn't mean that some of the
> technological problems involved in storing, annotating and querying
> corpora cannot be solved in a theory-agnostic manner.
>
> We may not agree on what exactly our markup will contain, but at least
> we should be able to settle on a common markup scheme, and so avoid
> problems like, say, someone's concordancer choking on someone else's
> POS tags. XML is unfortunately not good enough, as it requires proper
> nesting. And at present things are a mess, with a multitude of
> standards making it very hard to explore the same corpus with a
> variety of tools.
>
> At a recent workshop, Martin Wynne mentioned the CLARIN project
> (http://www.clarin.eu/external/index.php?page=about-clarin), which
> seems a great step in the right way, but I'm not sure how complete
> their set of standards is at present.
>
> A.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list