[Corpora-List] language sort

Daniel Zeman zeman at ufal.mff.cuni.cz
Wed Jan 10 21:07:50 UTC 2007


Maria,

why does file-by-file approach not work for you? Does that mean that you 
have potentially more than one language within one file?

Dan

Maria Esteva napsal(a):
> Dear all,
>
> I am wondering if somebody knows of a program that will recognize and 
> sort large sets of files according to language. For my text mining 
> project, I need to sort sets of files that contain electronic texts 
> mostly in Spanish and English (although there is some French and some 
> Portuguese as well).There are many free language recognition 
> programmes out there but they work on a file by file bases. Let me 
> know if you have some advice.
>
> Thanks,
>
> Maria Esteva
> PhD Candidate
> School of Information
> University of Texas at Austin



More information about the Corpora mailing list