[Corpora-List] Free text corpora?

K Pavan kosuru.pavan at gmail.com
Wed Mar 3 06:50:28 UTC 2010


Hi  ,
    can any one suggest me any available  romanized/latinized corpora of any
language. Romanized i mean transliterated text of that particular language.

Regards ,
Pavan

On Wed, Mar 3, 2010 at 3:01 AM, Raphael Mudge <raffi at automattic.com> wrote:

> Hi Xin,
> A collection of plain text files of public domain books is available from
> Project Gutenberg:
>
> http://www.gutenberg.org/wiki/Main_Page
>
> You can also download Wikipedia and convert the data into plain text.
>
>
> http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/
>
> If you need to mark-up the corpus with a POS tagger, Stanford's POS tagger
> may work for you.
>
> http://nlp.stanford.edu/software/tagger.shtml
>
> -- Raphael
>
> Raphael Mudge
> Code Wrangler, Automattic
> http://www.afterthedeadline.com
>
>
> On Mar 2, 2010, at 6:38 AM, Xin Yan wrote:
>
>  Hello,
>>
>> can anyone tell me, if there are some free text corpora for commercial
>> purpose?
>> Thank you in advance!
>>
>> Best,
>> Xin Yan
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100303/7483ac08/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list