[Corpora-List] comparison of language varieties
Anstein Stefanie
Stefanie.Anstein at eurac.edu
Tue Jul 15 09:41:09 UTC 2008
Dear all,
This is a general query about comparing language variety corpora
following Asim's questions (see below).
I am looking for any automated corpus studies and tools
for comparing the varieties of a language,
in order to take them as a basis for further research
on the development of tools for the systematic and automated
comparison of linguistic varieties on the basis of text corpora.
Up to now I have contacted researchers of several variety corpus projects,
e.g. the 'International Corpus of English' ICE,
the 'Trésor de la Langue Française informatisé' TLFi, or
the 'Proyecto para el Estudio Sociolingüístico del Español de España y América' PRESEA.
I got pointed to semi-automatic studies on the lexical level,
e.g. at the Centro de Linguística da Universidade de Lisboa (CLUL).
As far as I can see now, there have not been any publications
on automated comparison tools for higher levels of linguistic description,
e.g. on collocations, syntactic differences or even on the textual level.
So I'd appreciate references to such studies, starting from the lexical level.
In addition, I'd be grateful about any other ideas on contrasting 'similar' corpora / data sets,
which might also come from quite different research fields.
I will post a summary with the replies I get.
Thank you for any kinds of hints,
Stefanie
--
Stefanie Anstein
Institute for Specialised Communication and Multilingualism
EURAC research
Viale Druso 1, I-39100 Bolzano
t +39 0471 055 135
f +39 0471 055 199
stefanie.anstein at eurac.edu
www.eurac.edu
This transmission is intended only for the use of the addressee and may contain confidential or legally privileged information.
If you receive this transmission by error, please notify the author immediately by mail and delete all copies of this transmission and any attachments.
Any use or dissemination of this communication is strictly prohibited by the "Privacy-Code", D.Lgs. 196/2003 and may conduct to penal prosecution and liability for damages.
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Asim
Sent: Tuesday, 27 May, 2008 19:41
To: corpora at uib.no
Subject: [Corpora-List] request for parsing and making the data in a form tobe used by wordsmith
Hello
I am working on Pakistani English. I have compiled a 2.1 million word corpus of written Pakistani English. It is the first ever corpus of Pakistani English .
I want to study the features of Pakistani variety of English. Could any tell me how to locate them. Any suggestion would be welcome.
I have tagged it and now trying to analyse it using both top down and bottom up approaches.
I want to study the verb particles and for this I want to parse the data as I think it is the only possibility that I can get the confirmation that either it is a preposition or particle. If there is any other way except manual study just tell me and I will be obliged.
Another issue is when I use some online available demo parsers like LFG how to store the results to be used with wordsmith 4 and use them to locate all the particles from my data .
Is there any solution.
Wish to hear from you soon.
Regards
Asim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080715/72f37c49/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list