Dear Yuri,<div><br></div><div>I didn't get what you mean by "Latin letters" but if you meant you are looking for some texts which are written in Farsi using English character, then you can have a look at this one:</div>
<div><br></div><div><a href="http://mohammad-mohanna.persianblog.ir/">http://mohammad-mohanna.persianblog.ir/</a></div><div><a href="http://web652.en.netlog.com/azadeh_nakhostin/guestbook">http://web652.en.netlog.com/azadeh_nakhostin/guestbook</a></div>
<div><br></div><div>If you have any more question,don't hesitate to ask.</div><div><br></div><div>Regards,</div><div>Masood Ghasemzadeh<br><br><div class="gmail_quote">On Sat, Jul 30, 2011 at 1:30 PM, <span dir="ltr"><<a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Today's Topics:<br>
<br>
1. Iranian (Persian) texts in Latin lettrs? (Yuri Tambovtsev)<br>
2. El demostrador CLARIN-ES-LAB ya está disponible (Marta Villegas)<br>
3. Proceedings of the 2nd Louhi Workshop published in the<br>
Journal of Biomedical Semantics (Sumithra Velupillai)<br>
4. LTC'11 Deadline Extension until August 9 (<a href="mailto:info@elda.org">info@elda.org</a>)<br>
5. Re: Speeding up the constitution of corpora from LexisNexis<br>
(Mike Scott)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Fri, 29 Jul 2011 17:04:54 +0700<br>
From: "Yuri Tambovtsev" <<a href="mailto:yutamb@mail.ru">yutamb@mail.ru</a>><br>
Subject: [Corpora-List] Iranian (Persian) texts in Latin lettrs?<br>
To: <<a href="mailto:corpora@uib.no">corpora@uib.no</a>><br>
<br>
Dear Corpora colleagues, do you know any websites of Iranian (Persian) texts in Latin lettrs? I cannot read Persian letters. However, I'd like to compare Persian sound chains with those I have in about 300 world languages. Looking forward to hearing from you soon to <a href="mailto:yutamb@mail.ru">yutamb@mail.ru</a> Yours sincerely Yuri Tambovtsev, Novosibirsk, Russia<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: text/html<br>
Size: 735 bytes<br>
Desc: not available<br>
URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110729/4b3b5647/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110729/4b3b5647/attachment.txt</a>><br>
<br>
------------------------------<br>
<br>
Message: 2<br>
Date: Thu, 28 Jul 2011 12:15:01 +0200<br>
From: Marta Villegas <<a href="mailto:marta.villegas@upf.edu">marta.villegas@upf.edu</a>><br>
Subject: [Corpora-List] El demostrador CLARIN-ES-LAB ya está<br>
disponible<br>
To: undisclosed-recipients:;<br>
<br>
Le pedimos disculpas si recibe esta información más de una vez<br>
<br>
(versión en html en<br>
<a href="http://clarin-es.iula.upf.edu/es/newsletter/newsletter-11/" target="_blank">http://clarin-es.iula.upf.edu/es/newsletter/newsletter-11/</a>)<br>
<br>
===========================================<br>
El demostrador CLARIN-ES-LAB ya está disponible<br>
===========================================<br>
<br>
CLARIN-ES-LAB se ha concebido como un laboratorio virtual para los<br>
investigadores que quieran familiarizarse<br>
con el uso de las herramientas ya disponibles como servicios web y<br>
enlazarlos en ciclos de trabajo para crear tareas complejas.<br>
Un entorno para compartir herramientas y un espacio colaborativo al servicio<br>
de la investigación y la innovación.<br>
<br>
Se puede acceder al laboratorio virtual en la dirección:<br>
<a href="http://clarin-es-lab.org/" target="_blank">http://clarin-es-lab.org/</a>.<br>
<br>
CLARIN-ES-LAB está pensado para todos aquellos investigadores que tratan con<br>
grandes cantidades de datos textuales<br>
y necesitan realizar algún tipo de análisis como:<br>
<br>
* Calcular las palabras más frecuentes de un texto<br>
* Identificar con qué adjetivos suele asociarse determinado sustantivo<br>
* Saber qué verbos (y en qué formas) tienen más presencia en un texto<br>
* Identificar los nombres propios de un texto<br>
* Calcular asociaciones de palabras más probables en un documento Medir<br>
la riqueza léxica de un corpus<br>
* Buscar los contextos en que aparece una determinada palabra o<br>
expresión<br>
* Identificar las estructuras más habituales en un texto<br>
* Calcular la distancia entre dos textos<br>
* Estudios de género, de estilo , ...<br>
* estudios de uso, distribución, entropia ? en la lengua<br>
* análisis del discurso, estudios de tendencias, autoría ...<br>
* estudios lexicométricos y estadísticos<br>
* monitorización de la lengua ?<br>
* etc?<br>
<br>
En la sección Documentación/Powered by<br>
Clarin!<<a href="http://gilmere.upf.edu/mvillegas/clarin-es-lab/documentacion/PoweredByClarin.htm#wkf1" target="_blank">http://gilmere.upf.edu/mvillegas/clarin-es-lab/documentacion/PoweredByClarin.htm#wkf1</a>>!<br>
(link) podemos ver diferentes casos reales que ejemplifican el potencial de<br>
Clarin-es-lab:<br>
<br>
- Elecciones 2011: análisis de la blogosfera política en campaña<br>
electoral<br>
- Androcentrismo en la prensa española: ¿de quién hablan las noticias?<br>
- Sentiment Analysis: (estudios de opinión)<br>
<br>
En la sección ?Paso a paso? encontraremos vídeos que, de manera rápida,<br>
ofrecen una guía de las capacidades del laboratorio.<br>
<br>
<br>
<br>
--<br>
Marta Villegas<br>
<a href="mailto:marta.villegas@upf.edu">marta.villegas@upf.edu</a><br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: text/html<br>
Size: 2874 bytes<br>
Desc: not available<br>
URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110728/6ff9c843/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110728/6ff9c843/attachment.txt</a>><br>
<br>
------------------------------<br>
<br>
Message: 3<br>
Date: Fri, 29 Jul 2011 15:45:55 +0200<br>
From: Sumithra Velupillai <<a href="mailto:sumithra@dsv.su.se">sumithra@dsv.su.se</a>><br>
Subject: [Corpora-List] Proceedings of the 2nd Louhi Workshop<br>
published in the Journal of Biomedical Semantics<br>
To: Corpora List <<a href="mailto:corpora@uib.no">corpora@uib.no</a>><br>
<br>
*** Apologies for multiple postings ***<br>
<br>
Dear colleagues,<br>
<br>
The Proceedings of the Second Louhi Workshop on Text and Data Mining of<br>
Health Documents is now published in the Journal of Biomedical Semantics<br>
and available here:<br>
<br>
<a href="http://www.jbiomedsem.com/supplements/2/S3" target="_blank">http://www.jbiomedsem.com/supplements/2/S3</a><br>
<br>
Best regards,<br>
<br>
Hercules Dalianis, Martin Hassel and Sumithra Velupillai<br>
--<br>
Sumithra Velupillai<br>
PhD Student<br>
Department of Computer and Systems Sciences, DSV<br>
Stockholm University<br>
Forum 100<br>
164 40 Kista<br>
<br>
Tel: +46 8 161174<br>
<br>
WWW: <a href="http://people.dsv.su.se/~sumithra/" target="_blank">http://people.dsv.su.se/~sumithra/</a><br>
<br>
<br>
<br>
------------------------------<br>
<br>
Message: 4<br>
Date: Fri, 29 Jul 2011 16:07:00 +0200<br>
From: <a href="mailto:info@elda.org">info@elda.org</a><br>
Subject: [Corpora-List] LTC'11 Deadline Extension until August 9<br>
To: destinataires inconnus:;<br>
<br>
[Apologies for cross-postings]<br>
<br>
Dear Colleague,<br>
<br>
Responding to numerous suggestions to extend the paper submission<br>
deadline we have fixed the NEW SUBMISSION DEADLINE to August 9, 2011<br>
(Tuesday). As any further extension could compromise the correct<br>
processing of the submissions, we may not be able to take into<br>
consideration papers submitted after this date.<br>
<br>
Best regards,<br>
LTC Organizers<br>
<a href="http://www.ltc.amu.edu.pl" target="_blank">www.ltc.amu.edu.pl</a><br>
<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: text/html<br>
Size: 661 bytes<br>
Desc: not available<br>
URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110729/5c312585/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110729/5c312585/attachment.txt</a>><br>
<br>
------------------------------<br>
<br>
Message: 5<br>
Date: Sat, 30 Jul 2011 09:00:20 +0100<br>
From: Mike Scott <<a href="mailto:mike@lexically.net">mike@lexically.net</a>><br>
Subject: Re: [Corpora-List] Speeding up the constitution of corpora<br>
from LexisNexis<br>
To: <a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
<br>
Mahe, hi<br>
<br>
We have been working on building corpora from this source at Aston<br>
University for research into the language of climate change. There are<br>
lots of problems with the newspaper database but most of these can be<br>
resolved fairly well:<br>
* duplicated articles (often exact duplicates but different dates or<br>
publications but also slightly-varied duplicates<br>
* imprecise/varied headers depending on the news-source<br>
* many sources extremely well represented (eg. US newspapers) but other<br>
coverage patchy (eg. Brazilian)<br>
* download restrictions (but these are generous so you can get lots of<br>
texts in one file)<br>
* these large files need splitting up, not difficult to automate<br>
Then you need to decide which publications or authors you do/don't wish<br>
to include in your corpus.<br>
I am considering making the software I have prepared for this purpose<br>
available to the wider community; it would need some enhancing regarding<br>
a help system first. It attempts to parse the mulit-text download into<br>
separate articles, filters out duplicates, and then lets the user filter<br>
the set by publications & authors exporting cleaned-up texts to<br>
single-article or monthly-based text files.<br>
<br>
Cheers -- Mike<br>
<br>
On 28/07/2011 14:55, Mahé BEN HAMED wrote:<br>
> Dear all,<br>
><br>
> Is there a way to speed up the building of corpora from the Lexis<br>
> Nexis newspaper database (given a set of search parameters) ? To which<br>
> extent can the whole process be automated?<br>
><br>
> Thanks,<br>
><br>
> Mahe BEN HAMED<br>
><br>
><br>
> _______________________________________________<br>
> UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
> Corpora mailing list<br>
> <a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
> <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
--<br>
Mike Scott<br>
<br>
***<br>
If you publish research which uses WordSmith, do let me know so I can include it at<br>
<a href="http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm" target="_blank">http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm</a><br>
***<br>
University of Aston and Lexical Analysis Software Ltd.<br>
<a href="mailto:mike.scott@aston.ac.uk">mike.scott@aston.ac.uk</a><br>
<a href="http://www.lexically.net" target="_blank">www.lexically.net</a><br>
<br>
-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>
Name: not available<br>
Type: text/html<br>
Size: 3255 bytes<br>
Desc: not available<br>
URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110730/fda37fdc/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110730/fda37fdc/attachment.txt</a>><br>
<br>
----------------------------------------------------------------------<br>
Send Corpora mailing list submissions to<br>
<a href="mailto:corpora@uib.no">corpora@uib.no</a><br>
<br>
To subscribe or unsubscribe via the World Wide Web, visit<br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
or, via email, send a message with subject or body 'help' to<br>
<a href="mailto:corpora-request@uib.no">corpora-request@uib.no</a><br>
<br>
You can reach the person managing the list at<br>
<a href="mailto:corpora-owner@uib.no">corpora-owner@uib.no</a><br>
<br>
When replying, please edit your Subject line so it is more specific<br>
than "Re: Contents of Corpora digest..."<br>
<br>
<br>
_______________________________________________<br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br>
<br>
End of Corpora Digest, Vol 49, Issue 33<br>
***************************************<br>
</blockquote></div><br><br clear="all"><br>-- <br>Yours Faithfully,<br>Masood Ghasemzadeh<div><a href="http://people.dsv.su.se/~masoodg/" target="_blank">http://people.dsv.su.se/~masoodg/</a></div><br>
</div>