[Corpora-List] Corpora Digest, Vol 56, Issue 34

Wajdi Zaghouani wajdiuqam at yahoo.com
Sun Feb 26 15:17:16 UTC 2012


Hi,


to answer the request from  topic  2.  Looking for Igbo, Hausa, and Yoruba Corpora (Fink, Clayton R.)

There is a Yoruba lexical Corpora available from the LDC at the following Link


http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2008L03



You can also check the LDC catalog for more lexical ressources for other africain languages.

Hope this helps

Wajdi Zaghouani
Phd. Candidate

University of Quebec at Montreal,
Linguistics Department





________________________________
 From: "corpora-request at uib.no" <corpora-request at uib.no>
To: corpora at uib.no 
Sent: Sunday, February 26, 2012 6:00:01 AM
Subject: Corpora Digest, Vol 56, Issue 34
 
Today's Topics:

   1.  Job opening: Post-Doc in English Corpus and/or    Computational
      Linguistics, TU Darmstadt, Germany (Stefan Evert)
   2.  Looking for Igbo, Hausa, and Yoruba Corpora (Fink, Clayton R.)
   3. Re:  Looking for Igbo, Hausa, and Yoruba Corpora (Jimmy O'Regan)


----------------------------------------------------------------------

Message: 1
Date: Sat, 25 Feb 2012 15:53:19 +0100
From: Stefan Evert <stefanML at collocations.de>
Subject: [Corpora-List] Job opening: Post-Doc in English Corpus and/or
    Computational Linguistics, TU Darmstadt, Germany
To: Corpora Mailing List <corpora at uib.no>

The English Computational Corpus Linguistics group at Technische Universität
Darmstadt is seeking to hire a post-doctoral research assistant.  The person
we're looking for has a background in English linguistics, experience with
corpus-based approaches and/or natural language processing, and is interested
in carrying out quantitative corpus studies with state-of-the-art methods and
tools.

We offer a tigh-knit and cooperative research group, highly motivated students
and a vibrant work environment. The main research interests of our group are:
- methodological foundations of corpus linguistics
- collocations
- distributional lexical semantics
- register studies and linguistic variation
- digital humanities 

Further information and details on the application procedure can be found in
the full job announcment below.  For informal enquiries, please contact
Prof. Dr. Stefan Evert <evert at linglit.tu-darmstadt.de>.

The deadline for applications is Friday, 9 March 2012.

------------------------------------------------------------------------------

English:  http://www.intern.tu-darmstadt.de/dez_vii/stellen/stellen_details_62912.en.jsp
German:  http://www.intern.tu-darmstadt.de/dez_vii/stellen/stellen_details_62912.de.jsp

------------------------------------------------------------------------------

The Institute of Linguistics and Literary Studies at the Faculty 02 Social and
Historical Sciences at Technische Universität Darmstadt invites applications
for a vacant position of a

  Research Assistant (Post-Doc) in English Linguistics
  (Code No. 74)

The position is initially for three years with a potential extension subject
to performance and funding.

The prospective postholder should have research interests in two or more of
the following areas:

- linguistic models of the English language
- collocations, register studies, etc.
- corpus and computational linguistics
- statistical approaches in linguistics

The prospective postholder is expected to contribute to research and teaching
in English linguistics. Courses taught must contribute to the teaching
portfolio in English linguistics at undergraduate and postgraduate level
(Joint Bachelor of Arts Anglistik, Master of Education Englisch, Master of
Arts Linguistic and Literary Computing). All courses are taught in
English. The postholder is furthermore expected to collaborate closely with
the team in English linguistics, assisting in research projects and the
writing of research proposals as well as taking over an amount of the
administrative duties such as monitoring student progress and general academic
management.

Candidates are expected to pursue independent research towards a further
qualification at post-doctoral level (Habilitation or equivalent such as
second book) as part of the fulfillment of their professional duties.

Candidates wishing to apply should fit the following profile:

- completed course of studies in English linguistics or teacher-training
   degree in English
- PhD in English linguistics, corpus linguistics or computational linguistics
- experience in corpus and/or computational linguistic approaches in
   linguistics
- excellent command of the English language (native or native-like written
   and spoken English)
- some teaching experience in English linguistics

The Technische Universität Darmstadt intends to increase the number of female
faculty members and encourages female candidates to apply. In case of equal
qualifications applicants with a degree of disability of at least 50 or equal
will be given preference. Wages and salaries are according to the collective
agreements on salary scales, which apply to the Technische Universität
Darmstadt (TV-TU Darmstadt). Part-time employment is generally possible.

Informal inquiries may be addressed to: 

  Prof. Dr. Stefan Evert, Technische Universität Darmstadt
  Institut für Sprach- und Literaturwissenschaft, Hochschulstr. 1, 64289 Darmstadt
  E-Mail: evert at linglit.tu-darmstadt.de

Applications should quote the post?s Identification Number and include a CV, a
list of publications, copies of relevant diplomas, and a record of teaching
and research activities. They should be sent to:

  The Dean of the Faculty of History and Social Science
  Prof. Dr. Michèle Knodt
  Residenzschloss
  64293 Darmstadt
  Germany

Applicants are asked to additionally send an electronic copy of their
application to the following e-mail address: sprachli at linglit.tu-darmstadt.de

Please note that applications will not be returned after the completion of the
recruitment process; applicants are therefore discouraged from submitting
originals of certificates as well as applications in folders.

Application deadline: 9 March 2012

------------------------------------------------------------------------------




------------------------------

Message: 2
Date: Sat, 25 Feb 2012 14:31:37 -0500
From: "Fink, Clayton R." <finkcr1 at jhuapl.edu>
Subject: [Corpora-List] Looking for Igbo, Hausa, and Yoruba Corpora
To: "corpora at hd.uib.no" <corpora at hd.uib.no>

There's a BBC Hausa service and a Yoruba-language Wikipedia, so there 
are some possibilities for those languages. Igbo seems to be a real 
problem, though, in terms of finding text corpora.

I'm interested, mostly, in training up language id models that I can use 
on names. I have some small corpora of first names and surnames scraped 
off of the Web, but it might be interesting to have some larger corpora 
to work from.

Thanks,

Clay

-- 
Clay Fink
Senior Software Engineer
The Johns Hopkins University Applied Physics Laboratory

240-228-4220




------------------------------

Message: 3
Date: Sat, 25 Feb 2012 20:23:06 +0000
From: "Jimmy O'Regan" <joregan at gmail.com>
Subject: Re: [Corpora-List] Looking for Igbo, Hausa, and Yoruba
    Corpora
To: "Fink, Clayton R." <finkcr1 at jhuapl.edu>
Cc: "corpora at hd.uib.no" <corpora at hd.uib.no>

On 25 February 2012 19:31, Fink, Clayton R. <finkcr1 at jhuapl.edu> wrote:
> There's a BBC Hausa service and a Yoruba-language Wikipedia, so there are
> some possibilities for those languages. Igbo seems to be a real problem,
> though, in terms of finding text corpora.
>

There's an Igbo Wikipedia: http://ig.wikipedia.org/wiki/Ih%C3%BC_Mbu

> I'm interested, mostly, in training up language id models that I can use on
> names. I have some small corpora of first names and surnames scraped off of
> the Web, but it might be interesting to have some larger corpora to work
> from.

Kevin Scannell's language id model set
(http://nltk.googlecode.com/svn/trunk/nltk_data/packages/corpora/langid.zip)
includes a trigram model for Igbo.


-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you



----------------------------------------------------------------------
Send Corpora mailing list submissions to
    corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit
    http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
    corpora-request at uib.no

You can reach the person managing the list at
    corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


End of Corpora Digest, Vol 56, Issue 34
***************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120226/79bf021c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list