[Corpora-List] Corpus Linguistics in the South 4: Hands-on workshop

Wed Aug 29 09:20:30 UTC 2012

We are pleased to announce that the next Corpus Linguistics in the South
will be hosted by the University of Portsmouth on Saturday 10 November.
It will be a practical hands-on workshop with software which may be
useful to corpus linguists. The programme and description of the
sessions are copied below.

As always, attendance is free but places are limited and will be
assigned on a first come first served basis. If you would like to
attend, please email charlotte.taylor at port.ac.uk. Could you also specify
if you would like to join us for lunch at a local cafe/restaurant (max
£10).

Programme 
9.15      Welcome coffee
9.30      Sketch Engine: Advanced workshop
            Adam Kilgarriff, Lexcom Computing, Brighton
11.00    EXMARaLDA (Extensible Markup Language for Discourse Annotation)
Daniel Jettka, Hamburg Centre for Spoken Corpora, Germany
13.00    Lunch
14.15    CHILDES (Child Language Data Exchange System)
Kevin McManus, University of Southampton
15.45    Unix for Corpus Users
            John Williams, University of Portsmouth
17.15    Arrangement of next two Corpus Linguistics in the South events
& Close

Sketch Engine: Advanced Workshop
This will be an opportunity for people with some experience of Sketch
Engine to see and try out some more advanced features, and also to ask
any questions, particular of the 'How do I do X?' variety.  As with most
software, most users are only aware of a small fraction of what the
software offers, and find it rewarding to have their repertoire
extended.   My usual experience with workshops of this kind is that
there are many instances of wide-eyed looks which say "Ah, so THAT is
how you do that!"  Come prepared with any queries or reports you want to
be able to do, but are not sure how, and we'll work out how together in
the workshop.

Introduction to EXMARaLDA
The workshop will introduce EXMARaLDA ("Extensible Markup Language for
Discourse Annotation"), a system of concepts, data formats, and tools
for the computer assisted transcription and annotation of spoken
language, and for the construction and analysis of spoken language
corpora.
During the workshop three related tools will be introduced: (1) the
Partitur Editor - a tool for inputting, editing, and outputting
transcriptions in partitur (musical score) notation, (2) the Corpus
Manager (CoMa) which is designed to merge transcripts created with the
Partitur Editor with their corresponding recordings into corpora and to
enrich them with metadata, and (3) the query tool EXAKT ("EXMARaLDA
Analysis and Concordancing Tool") for searching transcribed and
annotated phenomena in an EXMARaLDA corpus.
After a brief introduction, the participants will have the chance to
gain some practical experience with the tools. The focus will presumably
be on the transcription and annotation of audio and/or video data in the
Partitur Editor so please feel free to bring along your own data for
testing.
To find out more about EXMARaLDA visit
http://www.exmaralda.org/en_index.html

Introduction to CHILDES
The overall purpose of the session is to provide practical, hands-on
experience of the CHILDES database and its tools for researchers working
in any field of language acquisition. In particular, we aim:
a)   to introduce researchers unfamiliar with CHILDES, but planning to
do empirical work, to the basics of transcription and coding of new and
existing material and to the tools available to analyse data;
b)   to help researchers in addressing specific research questions
within CHILDES (e.g. use of part-of-speech tagger, searches on
morphosyntactic lines, etc).

Introduction to Unix for Corpus Users
This workshop is intended for corpus users with little or no knowledge
of the Unix command line who would like to extend their repertoire of
searching, sorting, and synthesizing techniques beyond those that are
available through the standard corpus-query software packages
(SketchEngine, AntConc, Wordsmith, etc). The workshop will be divided

into three phases:
a)   Some baoptions, input & output, pipes, file management, aliases, .rc files
b)   The most useful Unix commands for corpus linguists: cat, grep, sed,
sort, uniq (We will chain some of these together to create a customized
word list with frequencies) . Some of these commands are integrated into
the standard packages but by using them at the command line their range
and flexibility can be greatly extended. This part of the workshop will
also include a discussion of regular expressions.
c)   It is hoped to be able to demonstrate a simple Unix shellscript
(program) which will convert batches of .doc and .pdf files to .txt , to
aid participants in building their own corpora. This tool will be
available to take away (or to be sent by email) at the end of the
workshop.

--------------------------------------------------
Year 1 Tutor, SLAS
Senior Lecturer in English Language and Linguistics

School of Languages and Area Studies
University of Portsmouth
Park Building
King Henry I Street
Portsmouth
PO1 2DZ

Room 4.31, Tel. 023 92 846161
http://www.port.ac.uk/departments/academic/slas/staff/title,103868,en.html
http://port.academia.edu/CharlotteTaylor

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120829/6ae1c451/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora