[Corpora-List] software for cooccurence/ collocations analysis in german texts

Milos Jakubicek jak at fi.muni.cz
Fri Aug 22 19:47:02 UTC 2014


Dear Abdoulaye,

you may just upload all the texts into Sketch Engine (www.sketchengine.co.uk),
have them automatically tagged by TreeTagger and lemmatised by RFTagger,
extract collocations using your favourite association score (T-score,
MI-score, logDice, ...) and have a look at the word sketches (collocations
by grammatical relation) on top of that -- all this works for much larger
corpora than 10M.

Full disclosure: I'm part of the development team of Sketch Engine ;)

Best,
Milos


2014-08-15 12:16 GMT+02:00 Abdoulaye Dramé <abou at drame.de>:

>   Hello,
>
>  I would like to find co-occuring words in  german texts. The number of
> texts I have is about 1 000 000 (one million), with each text having about
> 10 sentences.
>
>  Does anybody know where I can find a software to do the analysis on such
> a big amount of texts?
>
>  I would prefer a java software but others are also ok provided they run
> on ubuntu.
>
>  Any help would be appreciated.
>
>  Regards,
>
>  Drame A. [image: Senden]
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140822/1a3d2ece/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list