[Corpora-List] Iranian (Persian) texts in Latin lettrs? (Yuri Tambovtsev)

Sat Jul 30 10:49:50 UTC 2011

Dear Yuri,

I didn't get what you mean by "Latin letters" but if you meant you are
looking for some texts which are written in Farsi using English character,
then you can have a look at this one:

http://mohammad-mohanna.persianblog.ir/
http://web652.en.netlog.com/azadeh_nakhostin/guestbook

If you have any more question,don't hesitate to ask.

Regards,
Masood Ghasemzadeh

On Sat, Jul 30, 2011 at 1:30 PM, <corpora-request at uib.no> wrote:

> Today's Topics:
>
>   1.  Iranian (Persian) texts in Latin lettrs? (Yuri Tambovtsev)
>   2.  El demostrador CLARIN-ES-LAB ya está disponible (Marta Villegas)
>   3.  Proceedings of the 2nd Louhi Workshop published in the
>      Journal of Biomedical Semantics (Sumithra Velupillai)
>   4.  LTC'11 Deadline Extension until August 9 (info at elda.org)
>   5. Re:  Speeding up the constitution of corpora from LexisNexis
>      (Mike Scott)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 29 Jul 2011 17:04:54 +0700
> From: "Yuri Tambovtsev" <yutamb at mail.ru>
> Subject: [Corpora-List] Iranian (Persian) texts in Latin lettrs?
> To: <corpora at uib.no>
>
> Dear Corpora colleagues, do you know any websites of Iranian (Persian)
> texts in Latin lettrs? I cannot read Persian letters. However, I'd like to
> compare Persian sound chains with those I have in about 300 world languages.
> Looking forward to hearing from you soon to yutamb at mail.ru  Yours
> sincerely Yuri Tambovtsev, Novosibirsk, Russia
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 735 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110729/4b3b5647/attachment.txt
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 28 Jul 2011 12:15:01 +0200
> From: Marta Villegas <marta.villegas at upf.edu>
> Subject: [Corpora-List] El demostrador CLARIN-ES-LAB ya está
>        disponible
> To: undisclosed-recipients:;
>
> Le pedimos disculpas si recibe esta información más de una vez
>
> (versión en html en
> http://clarin-es.iula.upf.edu/es/newsletter/newsletter-11/)
>
> ===========================================
> El demostrador CLARIN-ES-LAB ya está disponible
> ===========================================
>
> CLARIN-ES-LAB se ha concebido como un laboratorio virtual para los
> investigadores que quieran familiarizarse
> con el uso de las herramientas ya disponibles como servicios web y
> enlazarlos en ciclos de trabajo para crear tareas complejas.
> Un entorno para compartir herramientas y un espacio colaborativo al
> servicio
> de la investigación y la innovación.
>
> Se puede acceder al laboratorio virtual en la dirección:
> http://clarin-es-lab.org/.
>
> CLARIN-ES-LAB está pensado para todos aquellos investigadores que tratan
> con
> grandes cantidades de datos textuales
> y necesitan realizar algún tipo de análisis como:
>
>    * Calcular las palabras más frecuentes de un texto
>    * Identificar con qué adjetivos suele asociarse determinado sustantivo
>    * Saber qué verbos (y en qué formas) tienen más presencia en un texto
>    * Identificar los nombres propios de un texto
>    * Calcular asociaciones de palabras más probables en un documento Medir
> la riqueza léxica de un corpus
>    * Buscar los contextos en que aparece una determinada palabra o
> expresión
>    * Identificar las estructuras más habituales en un texto
>    * Calcular la distancia entre dos textos
>    * Estudios de género, de estilo , ...
>    * estudios de uso, distribución, entropia ? en la lengua
>    * análisis del discurso, estudios de tendencias, autoría ...
>    * estudios lexicométricos y estadísticos
>    * monitorización de la lengua ?
>    * etc?
>
> En la sección Documentación/Powered by
> Clarin!<
> http://gilmere.upf.edu/mvillegas/clarin-es-lab/documentacion/PoweredByClarin.htm#wkf1
> >!
> (link) podemos ver diferentes casos reales que ejemplifican el potencial de
> Clarin-es-lab:
>
>    - Elecciones 2011:    análisis de la blogosfera política en campaña
> electoral
>    - Androcentrismo en la prensa española:   ¿de quién hablan las noticias?
>    - Sentiment Analysis:    (estudios de opinión)
>
> En la sección ?Paso a paso? encontraremos vídeos que, de manera rápida,
> ofrecen una guía de las capacidades del laboratorio.
>
>
>
> --
> Marta Villegas
> marta.villegas at upf.edu
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 2874 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110728/6ff9c843/attachment.txt
> >
>
> ------------------------------
>
> Message: 3
> Date: Fri, 29 Jul 2011 15:45:55 +0200
> From: Sumithra Velupillai <sumithra at dsv.su.se>
> Subject: [Corpora-List] Proceedings of the 2nd Louhi Workshop
>        published in the Journal of Biomedical Semantics
> To: Corpora List <corpora at uib.no>
>
> *** Apologies for multiple postings ***
>
> Dear colleagues,
>
> The Proceedings of the Second Louhi Workshop on Text and Data Mining of
> Health Documents is now published in the Journal of Biomedical Semantics
> and available here:
>
> http://www.jbiomedsem.com/supplements/2/S3
>
> Best regards,
>
> Hercules Dalianis, Martin Hassel and Sumithra Velupillai
> --
> Sumithra Velupillai
> PhD Student
> Department of Computer and Systems Sciences, DSV
> Stockholm University
> Forum 100
> 164 40 Kista
>
> Tel: +46 8 161174
>
> WWW: http://people.dsv.su.se/~sumithra/
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 29 Jul 2011 16:07:00 +0200
> From: info at elda.org
> Subject: [Corpora-List] LTC'11 Deadline Extension until August 9
> To: destinataires inconnus:;
>
> [Apologies for cross-postings]
>
> Dear Colleague,
>
> Responding to numerous suggestions to extend the paper submission
> deadline we have fixed the NEW SUBMISSION DEADLINE to August 9, 2011
> (Tuesday). As any further extension could compromise the correct
> processing of the submissions, we may not be able to take into
> consideration papers submitted after this date.
>
> Best regards,
> LTC Organizers
> www.ltc.amu.edu.pl
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 661 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110729/5c312585/attachment.txt
> >
>
> ------------------------------
>
> Message: 5
> Date: Sat, 30 Jul 2011 09:00:20 +0100
> From: Mike Scott <mike at lexically.net>
> Subject: Re: [Corpora-List] Speeding up the constitution of corpora
>        from    LexisNexis
> To: corpora at uib.no
>
> Mahe, hi
>
> We have been working on building corpora from this source at Aston
> University for research into the language of climate change. There are
> lots of problems with the newspaper database but most of these can be
> resolved fairly well:
> * duplicated articles (often exact duplicates but different dates or
> publications but also slightly-varied duplicates
> * imprecise/varied headers depending on the news-source
> * many sources extremely well represented (eg. US newspapers) but other
> coverage patchy (eg. Brazilian)
> * download restrictions (but these are generous so you can get lots of
> texts in one file)
> * these large files need splitting up, not difficult to automate
> Then you need to decide which publications or authors you do/don't wish
> to include in your corpus.
> I am considering making the software I have prepared for this purpose
> available to the wider community; it would need some enhancing regarding
> a help system first. It attempts to parse the mulit-text download into
> separate articles, filters out duplicates, and then lets the user filter
> the set by publications & authors exporting cleaned-up texts to
> single-article or monthly-based text files.
>
> Cheers -- Mike
>
> On 28/07/2011 14:55, Mahé BEN HAMED wrote:
> > Dear all,
> >
> > Is there a way to speed up the building of corpora from the Lexis
> > Nexis newspaper database (given a set of search parameters) ? To which
> > extent can the whole process be automated?
> >
> > Thanks,
> >
> > Mahe BEN HAMED
> >
> >
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
>
> --
> Mike Scott
>
> ***
> If you publish research which uses WordSmith, do let me know so I can
> include it at
>
> http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
> ***
> University of Aston and Lexical Analysis Software Ltd.
> mike.scott at aston.ac.uk
> www.lexically.net
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 3255 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110730/fda37fdc/attachment.txt
> >
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
>        corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
>        corpora-request at uib.no
>
> You can reach the person managing the list at
>        corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 49, Issue 33
> ***************************************
>

-- 
Yours Faithfully,
Masood Ghasemzadeh
http://people.dsv.su.se/~masoodg/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110730/a021437c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora