[Corpora-List] announcing pukwac and wackypedia

Linas Vepstas linasvepstas at gmail.com
Mon Jan 4 14:46:04 UTC 2010


2010/1/4 Eric Atwell <csc6ea at leeds.ac.uk>:
> Marco, Linas,
>
> thanks for making available these dependency-parsed English corpora.

Welcome.

> What do you see these being used for? What are the useful applications of
> dependency-parsed treebanks?

I don't quite understand the stress on the word "dependency" --
is this a questin about the need for parsed treebanks in general,
or for dependency-parsed treebanks?

Parsing still takes a significant amount of CPU time, so having
pre-processed text is useful for several tasks.  Personally, I've
used this data for several tasks:

-- studying correlations between word-sense assignments and
   grammatical structure (paper in preparation)
-- building up a statistical database to guide NL output
-- using pattern matching to perform question answering
-- using the parse as input to a knowledge-extraction task
   (identifying entities and their properties/attributes)
-- using the parsed text to provide a substrate of
   "common-sense" knowledge for use in automated
   reasoning systems.

Note that all but the first task are essentially "AI" tasks, rather
than "linguistics" tasks.  Certainly, the last set of tasks attract
various kinds of commercial interest as well, for specialized
search engines and assistants of various sorts.

--linas

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list