[Corpora-List] Iranian (Persian) texts in Latin lettrs? (Yuri Tambovtsev)
masood ghasemzadeh
masood.ghz at gmail.com
Sat Jul 30 10:49:50 UTC 2011
Dear Yuri,
I didn't get what you mean by "Latin letters" but if you meant you are
looking for some texts which are written in Farsi using English character,
then you can have a look at this one:
http://mohammad-mohanna.persianblog.ir/
http://web652.en.netlog.com/azadeh_nakhostin/guestbook
If you have any more question,don't hesitate to ask.
Regards,
Masood Ghasemzadeh
On Sat, Jul 30, 2011 at 1:30 PM, <corpora-request at uib.no> wrote:
> Today's Topics:
>
> 1. Iranian (Persian) texts in Latin lettrs? (Yuri Tambovtsev)
> 2. El demostrador CLARIN-ES-LAB ya está disponible (Marta Villegas)
> 3. Proceedings of the 2nd Louhi Workshop published in the
> Journal of Biomedical Semantics (Sumithra Velupillai)
> 4. LTC'11 Deadline Extension until August 9 (info at elda.org)
> 5. Re: Speeding up the constitution of corpora from LexisNexis
> (Mike Scott)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 29 Jul 2011 17:04:54 +0700
> From: "Yuri Tambovtsev" <yutamb at mail.ru>
> Subject: [Corpora-List] Iranian (Persian) texts in Latin lettrs?
> To: <corpora at uib.no>
>
> Dear Corpora colleagues, do you know any websites of Iranian (Persian)
> texts in Latin lettrs? I cannot read Persian letters. However, I'd like to
> compare Persian sound chains with those I have in about 300 world languages.
> Looking forward to hearing from you soon to yutamb at mail.ru Yours
> sincerely Yuri Tambovtsev, Novosibirsk, Russia
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 735 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110729/4b3b5647/attachment.txt
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 28 Jul 2011 12:15:01 +0200
> From: Marta Villegas <marta.villegas at upf.edu>
> Subject: [Corpora-List] El demostrador CLARIN-ES-LAB ya está
> disponible
> To: undisclosed-recipients:;
>
> Le pedimos disculpas si recibe esta información más de una vez
>
> (versión en html en
> http://clarin-es.iula.upf.edu/es/newsletter/newsletter-11/)
>
> ===========================================
> El demostrador CLARIN-ES-LAB ya está disponible
> ===========================================
>
> CLARIN-ES-LAB se ha concebido como un laboratorio virtual para los
> investigadores que quieran familiarizarse
> con el uso de las herramientas ya disponibles como servicios web y
> enlazarlos en ciclos de trabajo para crear tareas complejas.
> Un entorno para compartir herramientas y un espacio colaborativo al
> servicio
> de la investigación y la innovación.
>
> Se puede acceder al laboratorio virtual en la dirección:
> http://clarin-es-lab.org/.
>
> CLARIN-ES-LAB está pensado para todos aquellos investigadores que tratan
> con
> grandes cantidades de datos textuales
> y necesitan realizar algún tipo de análisis como:
>
> * Calcular las palabras más frecuentes de un texto
> * Identificar con qué adjetivos suele asociarse determinado sustantivo
> * Saber qué verbos (y en qué formas) tienen más presencia en un texto
> * Identificar los nombres propios de un texto
> * Calcular asociaciones de palabras más probables en un documento Medir
> la riqueza léxica de un corpus
> * Buscar los contextos en que aparece una determinada palabra o
> expresión
> * Identificar las estructuras más habituales en un texto
> * Calcular la distancia entre dos textos
> * Estudios de género, de estilo , ...
> * estudios de uso, distribución, entropia ? en la lengua
> * análisis del discurso, estudios de tendencias, autoría ...
> * estudios lexicométricos y estadísticos
> * monitorización de la lengua ?
> * etc?
>
> En la sección Documentación/Powered by
> Clarin!<
> http://gilmere.upf.edu/mvillegas/clarin-es-lab/documentacion/PoweredByClarin.htm#wkf1
> >!
> (link) podemos ver diferentes casos reales que ejemplifican el potencial de
> Clarin-es-lab:
>
> - Elecciones 2011: análisis de la blogosfera política en campaña
> electoral
> - Androcentrismo en la prensa española: ¿de quién hablan las noticias?
> - Sentiment Analysis: (estudios de opinión)
>
> En la sección ?Paso a paso? encontraremos vídeos que, de manera rápida,
> ofrecen una guía de las capacidades del laboratorio.
>
>
>
> --
> Marta Villegas
> marta.villegas at upf.edu
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 2874 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110728/6ff9c843/attachment.txt
> >
>
> ------------------------------
>
> Message: 3
> Date: Fri, 29 Jul 2011 15:45:55 +0200
> From: Sumithra Velupillai <sumithra at dsv.su.se>
> Subject: [Corpora-List] Proceedings of the 2nd Louhi Workshop
> published in the Journal of Biomedical Semantics
> To: Corpora List <corpora at uib.no>
>
> *** Apologies for multiple postings ***
>
> Dear colleagues,
>
> The Proceedings of the Second Louhi Workshop on Text and Data Mining of
> Health Documents is now published in the Journal of Biomedical Semantics
> and available here:
>
> http://www.jbiomedsem.com/supplements/2/S3
>
> Best regards,
>
> Hercules Dalianis, Martin Hassel and Sumithra Velupillai
> --
> Sumithra Velupillai
> PhD Student
> Department of Computer and Systems Sciences, DSV
> Stockholm University
> Forum 100
> 164 40 Kista
>
> Tel: +46 8 161174
>
> WWW: http://people.dsv.su.se/~sumithra/
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 29 Jul 2011 16:07:00 +0200
> From: info at elda.org
> Subject: [Corpora-List] LTC'11 Deadline Extension until August 9
> To: destinataires inconnus:;
>
> [Apologies for cross-postings]
>
> Dear Colleague,
>
> Responding to numerous suggestions to extend the paper submission
> deadline we have fixed the NEW SUBMISSION DEADLINE to August 9, 2011
> (Tuesday). As any further extension could compromise the correct
> processing of the submissions, we may not be able to take into
> consideration papers submitted after this date.
>
> Best regards,
> LTC Organizers
> www.ltc.amu.edu.pl
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 661 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110729/5c312585/attachment.txt
> >
>
> ------------------------------
>
> Message: 5
> Date: Sat, 30 Jul 2011 09:00:20 +0100
> From: Mike Scott <mike at lexically.net>
> Subject: Re: [Corpora-List] Speeding up the constitution of corpora
> from LexisNexis
> To: corpora at uib.no
>
> Mahe, hi
>
> We have been working on building corpora from this source at Aston
> University for research into the language of climate change. There are
> lots of problems with the newspaper database but most of these can be
> resolved fairly well:
> * duplicated articles (often exact duplicates but different dates or
> publications but also slightly-varied duplicates
> * imprecise/varied headers depending on the news-source
> * many sources extremely well represented (eg. US newspapers) but other
> coverage patchy (eg. Brazilian)
> * download restrictions (but these are generous so you can get lots of
> texts in one file)
> * these large files need splitting up, not difficult to automate
> Then you need to decide which publications or authors you do/don't wish
> to include in your corpus.
> I am considering making the software I have prepared for this purpose
> available to the wider community; it would need some enhancing regarding
> a help system first. It attempts to parse the mulit-text download into
> separate articles, filters out duplicates, and then lets the user filter
> the set by publications & authors exporting cleaned-up texts to
> single-article or monthly-based text files.
>
> Cheers -- Mike
>
> On 28/07/2011 14:55, Mahé BEN HAMED wrote:
> > Dear all,
> >
> > Is there a way to speed up the building of corpora from the Lexis
> > Nexis newspaper database (given a set of search parameters) ? To which
> > extent can the whole process be automated?
> >
> > Thanks,
> >
> > Mahe BEN HAMED
> >
> >
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
>
> --
> Mike Scott
>
> ***
> If you publish research which uses WordSmith, do let me know so I can
> include it at
>
> http://www.lexically.net/wordsmith/corpus_linguistics_links/papers_using_wordsmith.htm
> ***
> University of Aston and Lexical Analysis Software Ltd.
> mike.scott at aston.ac.uk
> www.lexically.net
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: text/html
> Size: 3255 bytes
> Desc: not available
> URL: <
> http://www.uib.no/mailman/public/corpora/attachments/20110730/fda37fdc/attachment.txt
> >
>
> ----------------------------------------------------------------------
> Send Corpora mailing list submissions to
> corpora at uib.no
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mailman.uib.no/listinfo/corpora
> or, via email, send a message with subject or body 'help' to
> corpora-request at uib.no
>
> You can reach the person managing the list at
> corpora-owner at uib.no
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Corpora digest..."
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> End of Corpora Digest, Vol 49, Issue 33
> ***************************************
>
--
Yours Faithfully,
Masood Ghasemzadeh
http://people.dsv.su.se/~masoodg/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110730/a021437c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list