[Corpora-List] Re : Longest common subsequences algorithms in corpora

Gael Lejeune gael.lejeune at unicaen.fr
Thu Feb 28 13:42:29 UTC 2013


Another example of application of LCS algorithms in corpora:
  We used detection of repeated strings in press articles to identify 
texts relevant for epidemic surveillance and detect what disease spread 
where.
  It proved particularly useful for articles written in morphologically 
rich languages (Greek, Polish, Russian...) or languages with different 
writing systems (arabic, chinese).

Some examples are shown here:
https://daniel.greyc.fr/

More details can be found in this paper:
http://www.cs.helsinki.fi/u/doucet/papers/JapTAL2012.pdf

Gaël

<javascript:void(0)>

-- 
----------------------------------------
PhD Student, HUman Language TECHnologies (HULTECH)
Caen Campus 2, Bureau S3-365,
Boulevard du Maréchal Juin
14000 Caen
Tél: 02 31 56 73 98
http://lejeuneg.users.greyc.fr/
----------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130228/d8922b97/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list