Dear Yuri,<div><br></div><div>I didn't get what you mean by "Latin letters" but if you meant you are looking for some texts which are written in Farsi using English character, then you can have a look at this one:</div>

<div><br></div><div><a href="http://mohammad-mohanna.persianblog.ir/">http://mohammad-mohanna.persianblog.ir/</a></div><div><a href="http://web652.en.netlog.com/azadeh_nakhostin/guestbook">http://web652.en.netlog.com/azadeh_nakhostin/guestbook</a></div>

<div><br></div><div>If you have any more question,don't hesitate to ask.</div><div><br></div><div>Regards,</div><div>Masood Ghasemzadeh<br><br><div class="gmail_quote">On Sat, Jul 30, 2011 at 1:30 PM,  <span dir="ltr"><<a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Today's Topics:<br>

<br>

   1.  Iranian (Persian) texts in Latin lettrs? (Yuri Tambovtsev)<br>

   2.  El demostrador CLARIN-ES-LAB ya está disponible (Marta Villegas)<br>

   3.  Proceedings of the 2nd Louhi Workshop published in the<br>

      Journal of Biomedical Semantics (Sumithra Velupillai)<br>

   4.  LTC'11 Deadline Extension until August 9 (<a href="mailto:info@elda.org">info@elda.org</a>)<br>

   5. Re:  Speeding up the constitution of corpora from LexisNexis<br>

      (Mike Scott)<br>

<br>

<br>

----------------------------------------------------------------------<br>

<br>

Message: 1<br>

Date: Fri, 29 Jul 2011 17:04:54 +0700<br>

From: "Yuri Tambovtsev" <<a href="mailto:yutamb@mail.ru">yutamb@mail.ru</a>><br>

Subject: [Corpora-List] Iranian (Persian) texts in Latin lettrs?<br>

To: <<a href="mailto:corpora@uib.no">corpora@uib.no</a>><br>

<br>

Dear Corpora colleagues, do you know any websites of Iranian (Persian) texts in Latin lettrs? I cannot read Persian letters. However, I'd like to compare Persian sound chains with those I have in about 300 world languages. Looking forward to hearing from you soon to <a href="mailto:yutamb@mail.ru">yutamb@mail.ru</a>  Yours sincerely Yuri Tambovtsev, Novosibirsk, Russia<br>


-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: text/html<br>

Size: 735 bytes<br>

Desc: not available<br>

URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110729/4b3b5647/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110729/4b3b5647/attachment.txt</a>><br>

<br>

------------------------------<br>

<br>

Message: 2<br>

Date: Thu, 28 Jul 2011 12:15:01 +0200<br>

From: Marta Villegas <<a href="mailto:marta.villegas@upf.edu">marta.villegas@upf.edu</a>><br>

Subject: [Corpora-List] El demostrador CLARIN-ES-LAB ya está<br>

        disponible<br>

To: undisclosed-recipients:;<br>

<br>

Le pedimos disculpas si recibe esta información más de una vez<br>

<br>

(versión en html en<br>

<a href="http://clarin-es.iula.upf.edu/es/newsletter/newsletter-11/" target="_blank">http://clarin-es.iula.upf.edu/es/newsletter/newsletter-11/</a>)<br>

<br>

===========================================<br>

El demostrador CLARIN-ES-LAB ya está disponible<br>

===========================================<br>

<br>

CLARIN-ES-LAB se ha concebido como un laboratorio virtual para los<br>

investigadores que quieran familiarizarse<br>

con el uso de las herramientas ya disponibles como servicios web y<br>

enlazarlos en ciclos de trabajo para crear tareas complejas.<br>

Un entorno para compartir herramientas y un espacio colaborativo al servicio<br>

de la investigación y la innovación.<br>

<br>

Se puede acceder al laboratorio virtual en la dirección:<br>

<a href="http://clarin-es-lab.org/" target="_blank">http://clarin-es-lab.org/</a>.<br>

<br>

CLARIN-ES-LAB está pensado para todos aquellos investigadores que tratan con<br>

grandes cantidades de datos textuales<br>

y necesitan realizar algún tipo de análisis como:<br>

<br>

    * Calcular las palabras más frecuentes de un texto<br>

    * Identificar con qué adjetivos suele asociarse determinado sustantivo<br>

    * Saber qué verbos (y en qué formas) tienen más presencia en un texto<br>

    * Identificar los nombres propios de un texto<br>

    * Calcular asociaciones de palabras más probables en un documento Medir<br>

la riqueza léxica de un corpus<br>

    * Buscar los contextos en que aparece una determinada palabra o<br>

expresión<br>

    * Identificar las estructuras más habituales en un texto<br>

    * Calcular la distancia entre dos textos<br>

    * Estudios de género, de estilo , ...<br>

    * estudios de uso, distribución, entropia ? en la lengua<br>

    * análisis del discurso, estudios de tendencias, autoría ...<br>

    * estudios lexicométricos y estadísticos<br>

    * monitorización de la lengua ?<br>

    * etc?<br>

<br>

En la sección Documentación/Powered by<br>

Clarin!<<a href="http://gilmere.upf.edu/mvillegas/clarin-es-lab/documentacion/PoweredByClarin.htm#wkf1" target="_blank">http://gilmere.upf.edu/mvillegas/clarin-es-lab/documentacion/PoweredByClarin.htm#wkf1</a>>!<br>


(link) podemos ver diferentes casos reales que ejemplifican el potencial de<br>

Clarin-es-lab:<br>

<br>

    - Elecciones 2011:    análisis de la blogosfera política en campaña<br>

electoral<br>

    - Androcentrismo en la prensa española:   ¿de quién hablan las noticias?<br>

    - Sentiment Analysis:    (estudios de opinión)<br>

<br>

En la sección ?Paso a paso? encontraremos vídeos que, de manera rápida,<br>

ofrecen una guía de las capacidades del laboratorio.<br>

<br>

<br>

<br>

--<br>

Marta Villegas<br>

<a href="mailto:marta.villegas@upf.edu">marta.villegas@upf.edu</a><br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: text/html<br>

Size: 2874 bytes<br>

Desc: not available<br>

URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110728/6ff9c843/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110728/6ff9c843/attachment.txt</a>><br>

<br>

------------------------------<br>

<br>

Message: 3<br>

Date: Fri, 29 Jul 2011 15:45:55 +0200<br>

From: Sumithra Velupillai <<a href="mailto:sumithra@dsv.su.se">sumithra@dsv.su.se</a>><br>

Subject: [Corpora-List] Proceedings of the 2nd Louhi Workshop<br>

        published in the Journal of Biomedical Semantics<br>

To: Corpora List <<a href="mailto:corpora@uib.no">corpora@uib.no</a>><br>

<br>

*** Apologies for multiple postings ***<br>

<br>

Dear colleagues,<br>

<br>

The Proceedings of the Second Louhi Workshop on Text and Data Mining of<br>

Health Documents is now published in the Journal of Biomedical Semantics<br>

and available here:<br>

<br>

<a href="http://www.jbiomedsem.com/supplements/2/S3" target="_blank">http://www.jbiomedsem.com/supplements/2/S3</a><br>

<br>

Best regards,<br>

<br>

Hercules Dalianis, Martin Hassel and Sumithra Velupillai<br>

--<br>

Sumithra Velupillai<br>

PhD Student<br>

Department of Computer and Systems Sciences, DSV<br>

Stockholm University<br>

Forum 100<br>

164 40 Kista<br>

<br>

Tel: +46 8 161174<br>

<br>

WWW: <a href="http://people.dsv.su.se/~sumithra/" target="_blank">http://people.dsv.su.se/~sumithra/</a><br>

<br>

<br>

<br>

------------------------------<br>

<br>

Message: 4<br>

Date: Fri, 29 Jul 2011 16:07:00 +0200<br>

From: <a href="mailto:info@elda.org">info@elda.org</a><br>

Subject: [Corpora-List] LTC'11 Deadline Extension until August 9<br>

To: destinataires inconnus:;<br>

<br>

[Apologies for cross-postings]<br>

<br>

Dear Colleague,<br>

<br>

Responding to numerous suggestions to extend the paper submission<br>

deadline we have fixed the NEW SUBMISSION DEADLINE to August 9, 2011<br>

(Tuesday). As any further extension could compromise the correct<br>

processing of the submissions, we may not be able to take into<br>

consideration papers submitted after this date.<br>

<br>

Best regards,<br>

LTC Organizers<br>

<a href="http://www.ltc.amu.edu.pl" target="_blank">www.ltc.amu.edu.pl</a><br>

<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: text/html<br>

Size: 661 bytes<br>

Desc: not available<br>

URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110729/5c312585/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110729/5c312585/attachment.txt</a>><br>

<br>

------------------------------<br>

<br>

Message: 5<br>

Date: Sat, 30 Jul 2011 09:00:20 +0100<br>

From: Mike Scott <<a href="mailto:mike@lexically.net">mike@lexically.net</a>><br>

Subject: Re: [Corpora-List] Speeding up the constitution of corpora<br>

        from    LexisNexis<br>

To: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>

<br>

Mahe, hi<br>

<br>

We have been working on building corpora from this source at Aston<br>

University for research into the language of climate change. There are<br>

lots of problems with the newspaper database but most of these can be<br>

resolved fairly well:<br>

* duplicated articles (often exact duplicates but different dates or<br>

publications but also slightly-varied duplicates<br>

* imprecise/varied headers depending on the news-source<br>

* many sources extremely well represented (eg. US newspapers) but other<br>

coverage patchy (eg. Brazilian)<br>

* download restrictions (but these are generous so you can get lots of<br>

texts in one file)<br>

* these large files need splitting up, not difficult to automate<br>

Then you need to decide which publications or authors you do/don't wish<br>

to include in your corpus.<br>

I am considering making the software I have prepared for this purpose<br>

available to the wider community; it would need some enhancing regarding<br>

a help system first. It attempts to parse the mulit-text download into<br>

separate articles, filters out duplicates, and then lets the user filter<br>

the set by publications & authors exporting cleaned-up texts to<br>

single-article or monthly-based text files.<br>

<br>

Cheers -- Mike<br>

<br>

On 28/07/2011 14:55, Mahé BEN HAMED wrote:<br>

> Dear all,<br>

><br>

> Is there a way to speed up the building of corpora from the Lexis<br>

> Nexis newspaper database (given a set of search parameters) ? To which<br>

> extent can the whole process be automated?<br>

><br>

> Thanks,<br>

><br>

> Mahe BEN HAMED<br>

><br>

><br>

> _______________________________________________<br>

> UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>

> Corpora mailing list<br>

> <a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

> <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br>

--<br>

Mike Scott<br>

<br>

***<br>

If you publish research which uses WordSmith, do let me know so I can include it at<br>

<a href="http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm" target="_blank">http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm</a><br>

***<br>

University of Aston and Lexical Analysis Software Ltd.<br>

<a href="mailto:mike.scott@aston.ac.uk">mike.scott@aston.ac.uk</a><br>

<a href="http://www.lexically.net" target="_blank">www.lexically.net</a><br>

<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: text/html<br>

Size: 3255 bytes<br>

Desc: not available<br>

URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110730/fda37fdc/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110730/fda37fdc/attachment.txt</a>><br>

<br>

----------------------------------------------------------------------<br>

Send Corpora mailing list submissions to<br>

        <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>

<br>

To subscribe or unsubscribe via the World Wide Web, visit<br>

        <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

or, via email, send a message with subject or body 'help' to<br>

        <a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a><br>

<br>

You can reach the person managing the list at<br>

        <a href="mailto:corpora-owner@uib.no">corpora-owner@uib.no</a><br>

<br>

When replying, please edit your Subject line so it is more specific<br>

than "Re: Contents of Corpora digest..."<br>

<br>

<br>

_______________________________________________<br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br>

<br>

End of Corpora Digest, Vol 49, Issue 33<br>

***************************************<br>

</blockquote></div><br><br clear="all"><br>-- <br>Yours Faithfully,<br>Masood Ghasemzadeh<div><a href="http://people.dsv.su.se/~masoodg/" target="_blank">http://people.dsv.su.se/~masoodg/</a></div><br>

</div>