[Corpora-List] Re : Longest common subsequences algorithms in corpora
Gael Lejeune
gael.lejeune at unicaen.fr
Thu Feb 28 13:42:29 UTC 2013
Another example of application of LCS algorithms in corpora:
We used detection of repeated strings in press articles to identify
texts relevant for epidemic surveillance and detect what disease spread
where.
It proved particularly useful for articles written in morphologically
rich languages (Greek, Polish, Russian...) or languages with different
writing systems (arabic, chinese).
Some examples are shown here:
https://daniel.greyc.fr/
More details can be found in this paper:
http://www.cs.helsinki.fi/u/doucet/papers/JapTAL2012.pdf
Gaël
<javascript:void(0)>
--
----------------------------------------
PhD Student, HUman Language TECHnologies (HULTECH)
Caen Campus 2, Bureau S3-365,
Boulevard du Maréchal Juin
14000 Caen
Tél: 02 31 56 73 98
http://lejeuneg.users.greyc.fr/
----------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130228/d8922b97/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list