[Corpora-List] FW: Farsi corpora

Heshaam Faili hfaili at ut.ac.ir
Sat Sep 3 05:32:09 UTC 2011


You can use also TMC (Tehran monolingual corpus: also release from Univ. of Tehran) which contains about 250M words , just tokenized …

http://ece.ut.ac.ir/nlp/resources.html

 

Heshaam

 

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Khalid CHOUKRI
Sent: Thursday, September 01, 2011 9:28 PM
To: Yorick Wilks
Cc: corpora at uib.no
Subject: Re: [Corpora-List] Farsi corpora

 

Hi Yorick

some Farsi resources are available from ELRA catalogue (including an English-Persian parallel Corpus)
Just search Farsi on http://catalog.elra.info/search.php

best regards
Khalid


Yorick Wilks wrote, On 31/08/2011 22:23: 

Thanks to everyone for very useful pointers.
YW
 
 
On 31 Aug 2011, at 16:20, Jon Dehdari wrote:
 

Hello,
There are a couple different public-domain/Free news corpora here:
http://ling.ohio-state.edu/~jonsafari/corpora
 
The Hamshahri newspaper corpus is available here:
http://ece.ut.ac.ir/dbrg/Hamshahri
 
The POS-tagged Bijankhan newspaper corpus is available here:
http://ece.ut.ac.ir/dbrg/Bijankhan
 
And more information here:
http://www.iranianlinguistics.org/wiki/index.php?title=Persian#Corpora
 
 
Cheers,
-Jon Dehdari
 
 
On Wed, Aug 31, 2011 at 03:54:31PM -0400, Yorick Wilks wrote:

 
Is anyone aware of easily obtained Farsi corpora---domain not important?
I'd be grateful for pointers.
Yorick Wilks

 
 
 
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110903/3b9b87b6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: choukri.vcf
Type: text/x-vcard
Size: 315 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110903/3b9b87b6/attachment-0001.vcf>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00042.txt
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110903/3b9b87b6/attachment-0001.txt>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list