[Corpora-List] Re : Longest common subsequences algorithms in corpora
    Gael Lejeune 
    gael.lejeune at unicaen.fr
       
    Thu Feb 28 13:42:29 UTC 2013
    
    
  
Another example of application of LCS algorithms in corpora:
  We used detection of repeated strings in press articles to identify 
texts relevant for epidemic surveillance and detect what disease spread 
where.
  It proved particularly useful for articles written in morphologically 
rich languages (Greek, Polish, Russian...) or languages with different 
writing systems (arabic, chinese).
Some examples are shown here:
https://daniel.greyc.fr/
More details can be found in this paper:
http://www.cs.helsinki.fi/u/doucet/papers/JapTAL2012.pdf
Gaël
<javascript:void(0)>
-- 
----------------------------------------
PhD Student, HUman Language TECHnologies (HULTECH)
Caen Campus 2, Bureau S3-365,
Boulevard du Maréchal Juin
14000 Caen
Tél: 02 31 56 73 98
http://lejeuneg.users.greyc.fr/
----------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130228/d8922b97/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
    
    
More information about the Corpora
mailing list