[Corpora-List] language sort
Trond Trosterud
trond.trosterud at hum.uit.no
Thu Jan 11 17:09:57 UTC 2007
Maria Esteva kirjoitti 10. jan. 2007 kello 22.02:
> Dear all,
>
> I am wondering if somebody knows of a program that will recognize
> and sort large sets of files according to language.
My experience is that a file certainly may contain different
languages. For our work, we identify language down to the paragraph
level, although we would often like to be as fine-grained as sentence
level.
We use text_cat, cf.
http://www.let.rug.nl/~vannoord/TextCat/
and have very good experiences.
Trond.
----------------------------------------------------------------------
Trond Trosterud t +47 7764 4763
Institutt for språkvitskap, Det humanistiske fakultet m +47 950 70140
N-9037 Universitetet i Tromsø, Noreg f +47 7764 5216
Trond.Trosterud (a) hum.uit.no http://www.hum.uit.no/a/trond/
----------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070111/f6b31ced/attachment.htm>
More information about the Corpora
mailing list