[Corpora-List] Querying Dependency-Annotated Corpora

Siva Reddy siva at sivareddy.in
Mon Aug 6 13:56:45 UTC 2012


Hi Niels,

Sketch Engine (http://sketchengine.co.uk) now supports querying dependency
trees represented in CONLL format (Malt parser output is in CONLL format).
Word sketches (profiles) and thesaurus can also be extracted from the
parsed data.

Paper related to handling CONLL format in Sketch Engine:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/585_Paper.pdf

I have uploaded a portion of Penn Treebank (dependency) with which you can
play with at
http://corpdev.sketchengine.co.uk/run.cgi/first_form?corpname=23399c07

Sample CQL (corpus query language) queries:

1. All examples of dependency relation OBJ: [deprel="OBJ"]<http://bit.ly/Q2Pv6F>

2. All Keywords in Context of dependency relation OBJ: 1:[] []{0,5}
2:[deprel="OBJ"] & 2.head=1.id <http://bit.ly/QEWGIR>

3. Tag patterns of OBJ relation: 1:[] []{0,5} 2:[deprel="OBJ"] &
2.head=1.id<http://bit.ly/QEWZTO>


4. Word Sketch of a word, e.g. give-v, extracted from dependency corpus:
http://bit.ly/QEXptr

For more details, please contact personally.

Siva


On Mon, Jul 30, 2012 at 2:28 PM, Niels Ott <nott at sfs.uni-tuebingen.de>wrote:

> Dear Corpora People,
>
> I spent some time googling for a tool that allows to explore and query
> huge dependency-annotated corpora. With huge I 'm referring to something
> as large as sDeWaC (~44M sentences), annotated the way MaltParser would
> do it automagically. I found no such tool.
>
> How do people search for things in dependency treebanks?
>
> Thanks for your time and help.
>
> Best
>
>    Niels Ott
>
>
> --
> Niels Ott (M.A.), Computational Linguist
> SFB 833 "Bedeutungskonstitution", Projekt A4, Universität Tübingen
> http://www.sfs.uni-tuebingen.de/~nott
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

--
http://sivareddy.in
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120806/3189a294/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list