[Corpora-List] Corpora Digest, Vol 62, Issue 26 - open source tools for German language
Anne Schumann
anne.schumann at Tilde.lv
Wed Aug 29 10:15:07 UTC 2012
Sree,
For morphological analysis, take a look at RFTagger (http://www.ims.uni-stuttgart.de/projekte/corplex/RFTagger/). I also know of two other parsers: BitPar (http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/BitPar.html) and ParZu (https://github.com/rsennrich/ParZu).
Best,
Anne
Anne-Kathrin Schumann
Phd student
University of Vienna
Tilde
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of corpora-request at uib.no
Sent: Wednesday, August 29, 2012 1:00 PM
To: corpora at uib.no
Subject: Corpora Digest, Vol 62, Issue 26
Today's Topics:
1. 2 research associate positions (final reminder - deadline
2nd Sept) (Orasan, Constantin)
2. open source tools for German Langauge (sree ganesh)
3. Re: open source tools for German Langauge (Ivelina Nikolova)
4. Re: open source tools for German Langauge (Torsten Zesch)
5. Re: open source tools for German Langauge (manaal faruqui)
6. NLP Research Associate post at UCREL, Lancaster University
(Rayson, Paul)
7. Re: open source tools for German Langauge (Thomas Proisl)
8. Corpus Linguistics in the South 4: Hands-on workshop
(Charlotte Taylor)
----------------------------------------------------------------------
Message: 1
Date: Tue, 28 Aug 2012 14:06:45 +0000
From: "Orasan, Constantin" <C.Orasan at wlv.ac.uk>
Subject: [Corpora-List] 2 research associate positions (final reminder
- deadline 2nd Sept)
To: "corpora at lists.uib.no" <corpora at lists.uib.no>,
"cluk at dcs.shef.ac.uk" <cluk at dcs.shef.ac.uk>
[Apologies for multiple postings]
The Research Group in Computational Linguistics (http://clg.wlv.ac.uk) at the University of Wolverhampton invites applications for two research associate posts in the DVC project (http://clg.wlv.ac.uk/projects/DVC/)
Salary: £28,401 ? £31,020 pa (level of appointment dependent on qualifications and experience)
Duration: These are temporary, fixed-term appointments for maximum three years (dependent on start date of contract).
Application deadline: 2nd September 2012
RA1: Research Associate in Computational Linguistics (REF: A5910)
To work as part of a team to research computational linguistics approaches to investigate the relationship between the meaning and the use of English verbs. This is a project funded by AHRC and the successful applicant may be required to attend meetings in the Czech Republic. Applicants should have a PhD in Information Science, Computer Science or Natural Language Processing (or equivalent experience) and proven research experience in these fields. Applicants must be familiar with corpus linguistics and should have experience of at least some of the following fields: textual entailment, semantic role labelling and word sense disambiguation. Knowledge of a programming language is also essential. Experience of web application development, corpus annotation and using NLP tools is desirable. Applicants should feel comfortable with understanding linguistic theories such as Sinclair?s and/or Hanks?s principles of corpus pattern analysis.
RA2: Research Associate in Lexicography (REF: A5911)
To work as part of a team to research computational linguistics approaches to investigate the relationship between the meaning and the use of English verbs. This is a project funded by AHRC and the successful applicant may be required to attend meetings in the Czech Republic. Applicants should have a PhD in Corpus Linguistics and/or equivalent practical experience in Lexical Analysis for publication.
They must have experience of contextual analysis of meaning, collocational preferences, and lexical semantics, along with knowledge of corpus linguistics and dictionary building. Experience of working with ontologies and/or semantic types is desirable, as is knowledge of Sinclair?s and/or Hanks?s principles of corpus pattern analysis, familiarity with corpus annotation and the use of annotation tools, and exposure to computational linguistics.
For informal enquiries please contact Alison Carminke, alison.carminke at wlv.ac.uk quoting the reference number.
For detailed further particulars and an application form visit our
website: http://www.wlv.ac.uk
Alternatively please contact the Personnel Services Department, University of Wolverhampton, Molineux Street, Wolverhampton WV1 1SB.
Tel: 01902 321049 (ansaphone). For hearing impaired candidates our minicom number is 01902 321249. Email address: per at wlv.ac.uk Visit our website at http://www.wlv.ac.uk/
The University is eager to attract larger numbers of applications from groups of people currently under-represented in the staff population, especially from women and people from ethnic minority groups.
Established by Prof Mitkov in 1998, the Research Group in Computational Linguistics delivers cutting-edge research in a number of NLP areas such as anaphora resolution, automatic summarisation, question answering, multilingual text processing, multiple-choice question generation and text simplification. The results from the latest Research Assessment Exercise announced on 17 December 2008 confirm the Research Group in Computational Linguistics as one of the top performers in UK research.
The research group was ranked joint 3rd with 2 more universities in the Unit of Assessment ?Linguistics?. According to the league tables of the Guardian, The Times and Research Fortnight, research in Linguistics at the University of Wolverhampton in one of the top 6 in the UK.
--
Dr. Constantin Orasan <C.Orasan at wlv.ac.uk> Senior Lecturer in Computational Linguistics Deputy Head of the Research Group in Computational Linguistics Research Group in Computational Linguistics http://www.wlv.ac.uk/~in6093/ University of Wolverhampton
------------------------------
Message: 2
Date: Tue, 28 Aug 2012 17:05:55 +0200
From: sree ganesh <sganeshhcu at gmail.com>
Subject: [Corpora-List] open source tools for German Langauge
To: corpora at uib.no
Deare Members,
I would like to get some suggestions from you on
1. Are there any open source Morphological analysers and parsers for
German language?
2. I would like to extract Noun phrases for German corpus. Any suggestions
on this?
Regards
Sri
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 276 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120828/06e2c75d/attachment.txt>
------------------------------
Message: 3
Date: Tue, 28 Aug 2012 18:28:47 +0300
From: Ivelina Nikolova <iva at lml.bas.bg>
Subject: Re: [Corpora-List] open source tools for German Langauge
To: corpora at uib.no
Hi Sri,
I had the same problem and used this chunker:
http://www.semanticsoftware.info/munpex
<http://www.semanticsoftware.info/munpex#Installation>
You may find it useful too.
Best,
Ivelina
On 08/28/2012 06:05 PM, sree ganesh wrote:
> Deare Members,
> I would like to get some suggestions from you on
>
> 1. Are there any open source Morphological analysers and parsers for
> German language?
> 2. I would like to extract Noun phrases for German corpus. Any
> suggestions on this?
>
> Regards
> Sri
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1581 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120828/5fce6e29/attachment.txt>
------------------------------
Message: 4
Date: Tue, 28 Aug 2012 18:35:38 +0000
From: Torsten Zesch <zesch at ukp.informatik.tu-darmstadt.de>
Subject: Re: [Corpora-List] open source tools for German Langauge
To: "'corpora at uib.no' (corpora at uib.no)" <corpora at uib.no>
Dear Sri,
1.
StanfordParser (http://nlp.stanford.edu/software/lex-parser.shtml) and
mate-tools (http://code.google.com/p/mate-tools/)
come with pre-packaged models for German.
2.
Try TreeTaggerChunker (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/)
If you are working with Java, the DKPro Core framework (http://code.google.com/p/dkpro-core-asl/) comes with easy to use wrappers for TreeTagger and StanfordParser. An integration of the mate-tools is in preparation.
-Torsten
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of sree ganesh
Sent: Tuesday, August 28, 2012 5:06 PM
To: corpora at uib.no
Subject: [Corpora-List] open source tools for German Langauge
Deare Members,
I would like to get some suggestions from you on
1. Are there any open source Morphological analysers and parsers for German language?
2. I would like to extract Noun phrases for German corpus. Any suggestions on this?
Regards
Sri
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 5389 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120828/b71822b4/attachment.txt>
------------------------------
Message: 5
Date: Tue, 28 Aug 2012 20:17:10 -0400
From: manaal faruqui <manaalfar at gmail.com>
Subject: Re: [Corpora-List] open source tools for German Langauge
To: sree ganesh <sganeshhcu at gmail.com>
Cc: corpora at uib.no
Hi Sree,
You can also find the German-NER here, in case you want to further divide
the Noun-phrases into categories:
http://www.nlpado.de/~sebastian/software/ner_german.shtml
Best,
Manaal
On Tue, Aug 28, 2012 at 11:05 AM, sree ganesh <sganeshhcu at gmail.com> wrote:
> Deare Members,
> I would like to get some suggestions from you on
>
> 1. Are there any open source Morphological analysers and parsers for
> German language?
> 2. I would like to extract Noun phrases for German corpus. Any suggestions
> on this?
>
> Regards
> Sri
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1273 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120828/889fd29a/attachment.txt>
------------------------------
Message: 6
Date: Wed, 29 Aug 2012 07:26:41 +0000
From: "Rayson, Paul" <p.rayson at lancaster.ac.uk>
Subject: [Corpora-List] NLP Research Associate post at UCREL,
Lancaster University
To: "corpora at uib.no" <corpora at uib.no>
Research Associate in Natural Language Processing of Corporate Financial Communications
School of Computing and Communications
Salary: £25,251 to £29,249
Closing Date: Friday 21 September 2012
Interview Date: To be confirmed
Reference: A489
Applications are invited for a Research Associate position in natural language processing as part of an interdisciplinary team working on Corporate Financial Communications within the Department of Accounting and Finance and the School of Computing and Communications (SCC) at Lancaster University.
You should have a first degree or Master's degree in Computer Science, Computational Linguistics, Text Mining, or a related field; and a PhD in the area of corpus-based analysis, natural language processing or a closely related subject. You should also possess suitable software development skills and demonstrate the ability to work as part of a team as well as the capability to integrate diverse multi-disciplinary requirements into the design of the natural language processing tools to be developed.
Candidates are encouraged to make informal enquires to the project investigators Dr Paul Rayson (p.rayson at lancaster.ac.uk) in SCC or Prof Steven Young (s.young at lancaster.ac.uk) in Accounting and Finance.
For more details, see http://hr-jobs.lancs.ac.uk/Vacancy.aspx?ref=A489
Dr. Paul Rayson
Director of UCREL and Senior Lecturer in Computer Science
Faculty of Science and Technology Director of International Teaching Partnerships
School of Computing and Communications, Infolab21, Lancaster University, Lancaster, LA1 4WA, UK.
Web: http://www.comp.lancs.ac.uk/~paul/
Tel: +44 1524 510357 Fax: +44 1524 510492
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 4690 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120829/adc54706/attachment.txt>
------------------------------
Message: 7
Date: Wed, 29 Aug 2012 09:42:17 +0200
From: Thomas Proisl <tsproisl at linguistik.uni-erlangen.de>
Subject: Re: [Corpora-List] open source tools for German Langauge
To: sree ganesh <sganeshhcu at gmail.com>
Cc: corpora at uib.no
Dear Sri,
> 1. Are there any open source Morphological analysers and parsers for
> German language?
Morphisto (https://code.google.com/p/morphisto/) might fit your
needs. Here is a quote from its website:
> Morphisto is a morphological analyzer and generator for German
> wordforms. The basis of Morphisto is the open-source SMOR morphology
> for the German language developed by the University of Stuttgart (GPL
> v2) for which a free lexicon is provided under the Creative Commons
> 3.0 BY-SA Non-Commercial license.
Best regards,
Thomas
--
Department Germanistik und Komparatistik
Professur für Computerlinguistik
Bismarckstr. 6, 91054 Erlangen
Institut für Anglistik und Amerikanistik
Lehrstuhl für Anglistik, insbesondere Linguistik
Bismarckstr. 1, 91054 Erlangen
Fon: +49 9131 85-25908; Fax: +49 9131 85-29251
http://www.linguistik.uni-erlangen.de/~tsproisl/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120829/9faef760/attachment-0001.asc>
------------------------------
Message: 8
Date: Wed, 29 Aug 2012 10:20:30 +0100
From: "Charlotte Taylor" <Charlotte.Taylor at port.ac.uk>
Subject: [Corpora-List] Corpus Linguistics in the South 4: Hands-on
workshop
To: <corpora at uib.no>
We are pleased to announce that the next Corpus Linguistics in the South
will be hosted by the University of Portsmouth on Saturday 10 November.
It will be a practical hands-on workshop with software which may be
useful to corpus linguists. The programme and description of the
sessions are copied below.
As always, attendance is free but places are limited and will be
assigned on a first come first served basis. If you would like to
attend, please email charlotte.taylor at port.ac.uk. Could you also specify
if you would like to join us for lunch at a local cafe/restaurant (max
£10).
Programme
9.15 Welcome coffee
9.30 Sketch Engine: Advanced workshop
Adam Kilgarriff, Lexcom Computing, Brighton
11.00 EXMARaLDA (Extensible Markup Language for Discourse Annotation)
Daniel Jettka, Hamburg Centre for Spoken Corpora, Germany
13.00 Lunch
14.15 CHILDES (Child Language Data Exchange System)
Kevin McManus, University of Southampton
15.45 Unix for Corpus Users
John Williams, University of Portsmouth
17.15 Arrangement of next two Corpus Linguistics in the South events
& Close
Sketch Engine: Advanced Workshop
This will be an opportunity for people with some experience of Sketch
Engine to see and try out some more advanced features, and also to ask
any questions, particular of the 'How do I do X?' variety. As with most
software, most users are only aware of a small fraction of what the
software offers, and find it rewarding to have their repertoire
extended. My usual experience with workshops of this kind is that
there are many instances of wide-eyed looks which say "Ah, so THAT is
how you do that!" Come prepared with any queries or reports you want to
be able to do, but are not sure how, and we'll work out how together in
the workshop.
Introduction to EXMARaLDA
The workshop will introduce EXMARaLDA ("Extensible Markup Language for
Discourse Annotation"), a system of concepts, data formats, and tools
for the computer assisted transcription and annotation of spoken
language, and for the construction and analysis of spoken language
corpora.
During the workshop three related tools will be introduced: (1) the
Partitur Editor - a tool for inputting, editing, and outputting
transcriptions in partitur (musical score) notation, (2) the Corpus
Manager (CoMa) which is designed to merge transcripts created with the
Partitur Editor with their corresponding recordings into corpora and to
enrich them with metadata, and (3) the query tool EXAKT ("EXMARaLDA
Analysis and Concordancing Tool") for searching transcribed and
annotated phenomena in an EXMARaLDA corpus.
After a brief introduction, the participants will have the chance to
gain some practical experience with the tools. The focus will presumably
be on the transcription and annotation of audio and/or video data in the
Partitur Editor so please feel free to bring along your own data for
testing.
To find out more about EXMARaLDA visit
http://www.exmaralda.org/en_index.html
Introduction to CHILDES
The overall purpose of the session is to provide practical, hands-on
experience of the CHILDES database and its tools for researchers working
in any field of language acquisition. In particular, we aim:
a) to introduce researchers unfamiliar with CHILDES, but planning to
do empirical work, to the basics of transcription and coding of new and
existing material and to the tools available to analyse data;
b) to help researchers in addressing specific research questions
within CHILDES (e.g. use of part-of-speech tagger, searches on
morphosyntactic lines, etc).
Introduction to Unix for Corpus Users
This workshop is intended for corpus users with little or no knowledge
of the Unix command line who would like to extend their repertoire of
searching, sorting, and synthesizing techniques beyond those that are
available through the standard corpus-query software packages
(SketchEngine, AntConc, Wordsmith, etc). The workshop will be divided
into three phases:
a) Some baoptions, input & output, pipes, file management, aliases, .rc files
b) The most useful Unix commands for corpus linguists: cat, grep, sed,
sort, uniq (We will chain some of these together to create a customized
word list with frequencies) . Some of these commands are integrated into
the standard packages but by using them at the command line their range
and flexibility can be greatly extended. This part of the workshop will
also include a discussion of regular expressions.
c) It is hoped to be able to demonstrate a simple Unix shellscript
(program) which will convert batches of .doc and .pdf files to .txt , to
aid participants in building their own corpora. This tool will be
available to take away (or to be sent by email) at the end of the
workshop.
--------------------------------------------------
Year 1 Tutor, SLAS
Senior Lecturer in English Language and Linguistics
School of Languages and Area Studies
University of Portsmouth
Park Building
King Henry I Street
Portsmouth
PO1 2DZ
Room 4.31, Tel. 023 92 846161
http://www.port.ac.uk/departments/academic/slas/staff/title,103868,en.html
http://port.academia.edu/CharlotteTaylor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 19088 bytes
Desc: HTML
URL: <http://www.uib.no/mailman/public/corpora/attachments/20120829/e20a49fe/attachment.txt>
----------------------------------------------------------------------
Send Corpora mailing list submissions to
corpora at uib.no
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
corpora-request at uib.no
You can reach the person managing the list at
corpora-owner at uib.no
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
End of Corpora Digest, Vol 62, Issue 26
***************************************
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list