<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:"Balloon Text Char";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:"Tahoma","sans-serif";}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
span.BalloonTextChar
{mso-style-name:"Balloon Text Char";
mso-style-priority:99;
mso-style-link:"Balloon Text";
font-family:"Tahoma","sans-serif";}
span.EmailStyle21
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.EmailStyle22
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:#1F497D;}
span.EmailStyle23
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page Section1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'>Hello
again,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'>For
those who are interested, this is a summary of the replies to my request below
- many thanks to the contributors.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'>Adam
Kilgarriff pointed to the paper “<a
href="http://kilgarriff.co.uk/Publications/2001-K-CompCorpIJCL.pdf"><span
style='color:#1F497D'>Comparing Corpora</span></a>” (International Journal of
Corpus Linguistics 2001 6 (1): 1-37), to recent work in the "web as
corpus" community also on more general matters of comparisons between
language varieties (e.g. Ferraresi et al.: “Introducing and evaluating ukWaC, a
very large web-derived corpus of English”), and to the Sketch Engine, “which
supports ´keyword´analyses between a subcorpus and the rest of the corpus it is
part of.”<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'>Ana
Frankenberg sent information on the COMPARA parallel corpus of Portuguese
“with different varieties of Portuguese and English and with a complex
search facility which allows to compare and contrast different varieties of
these two languages.”<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-family:"Arial","sans-serif";color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-bottom:12.0pt'><span style='font-family:"Arial","sans-serif";
color:#1F497D'>Helen Johnson sent this link to a paper for “ideas of things to
look at in the comparison of varieties, which compares English written by
non-native English speakers in academic text and catalogues the variations
based on an L1 language group”: “The Way We Write” <a
href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1319188"><span
style='color:#1F497D'>http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1319188</span></a><o:p></o:p></span></p>
<p class=MsoNormal style='margin-bottom:12.0pt'><span style='font-family:"Arial","sans-serif";
color:#1F497D'>Michal Kren pointed to their recent study regarding diachronic
corpora comparison on the lexical level presented at Euralex 2008 (Corpus as a
Means for Study of Lexical Usage Changes by Michal Křen and Jaroslava Hlaváčová)
and especially to their “lesson learned: the necessity to obtain the highest
possible comparability of the base corpora, otherwise you can end up with lots
of garbage, as corpus composition differences can prove more significant than
differences in language one wants to study. Presumably this can be why there
are probably no automated tools of this kind for higher levels of language
description. You may also find interesting paper by Joerg Asmussen on very
similar topic, it is quoted in the references section of our paper.”<o:p></o:p></span></p>
<p class=MsoNormal style='margin-bottom:12.0pt'><span style='font-family:"Arial","sans-serif";
color:#1F497D'>I’m still grateful for more hints also on system-independent corpus
comparison approaches or tools, if there are any.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-bottom:12.0pt'><span style='font-family:"Arial","sans-serif";
color:#1F497D'>Thanks and best regards,<br>
Stefanie<o:p></o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:
12.0pt;margin-left:36.0pt'><span style='font-family:"Arial","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal style='mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:
12.0pt;margin-left:36.0pt'><span style='font-family:"Arial","sans-serif";
color:#1F497D'><br>
<br>
</span><o:p></o:p></p>
<p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='color:#1F497D'><o:p> </o:p></span></p>
<div>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'>
<p class=MsoNormal style='margin-left:36.0pt'><b><span style='font-size:10.0pt;
font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;
font-family:"Tahoma","sans-serif"'> Anstein Stefanie <br>
<b>Sent:</b> Tuesday, 15 July, 2008 11:41<br>
<b>To:</b> 'corpora@uib.no'<br>
<b>Subject:</b> comparison of language varieties<o:p></o:p></span></p>
</div>
</div>
<p class=MsoNormal style='margin-left:36.0pt'><o:p> </o:p></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>Dear all,<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>This is a general query about comparing
language variety corpora <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>following Asim’s questions (see below).<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>I am looking for any automated corpus studies
and tools <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>for comparing the varieties of a language, <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>in order to take them as a basis for further
research<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>on the development of tools for the
systematic and automated<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>comparison of linguistic varieties on the
basis of text corpora.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>Up to now I have contacted researchers of
several variety corpus projects,<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>e.g. the ‘International Corpus of English’
ICE, <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span lang=IT style='font-size:
10.0pt;font-family:"Arial","sans-serif"'>the ‘Trésor de la Langue Française
informatisé’ TLFi, or<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span lang=IT style='font-size:
10.0pt;font-family:"Arial","sans-serif"'>the ‘Proyecto para el Estudio
Sociolingüístico del Español de España y América’ PRESEA.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span lang=IT style='font-size:
10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>I got pointed to semi-automatic studies on
the lexical level, <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span lang=IT style='font-size:
10.0pt;font-family:"Arial","sans-serif"'>e.g. at the Centro de Linguística da
Universidade de Lisboa (CLUL).<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span lang=IT style='font-size:
10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>As far as I can see now, there have not been
any publications <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>on automated comparison tools for higher
levels of linguistic description, <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>e.g. on collocations, syntactic differences
or even on the textual level.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>So I’d appreciate references to such studies,
starting from the lexical level.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>In addition, I’d be grateful about any other
ideas on contrasting ‘similar’ corpora / data sets,<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>which might also come from quite different
research fields.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>I will post a summary with the replies I get.<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>Thank you for any kinds of hints,<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'>Stefanie<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:10.0pt;
font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:9.0pt;font-family:"Arial","sans-serif"'>--<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:9.0pt;font-family:"Arial","sans-serif"'>Stefanie Anstein<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:9.0pt;font-family:"Arial","sans-serif"'>Institute for
Specialised Communication and Multilingualism<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:9.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
lang=IT style='font-size:9.0pt;font-family:"Arial","sans-serif"'>EURAC research<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
lang=IT style='font-size:9.0pt;font-family:"Arial","sans-serif"'>Viale Druso 1,
I-39100 Bolzano<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
lang=DE style='font-size:9.0pt;font-family:"Arial","sans-serif"'>t +39 0471 055
135<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
lang=DE style='font-size:9.0pt;font-family:"Arial","sans-serif"'>f +39 0471 055
199<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
lang=DE style='font-size:9.0pt;font-family:"Arial","sans-serif"'><a
href="mailto:stefanie.anstein@eurac.edu">stefanie.anstein@eurac.edu</a><o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:9.0pt;font-family:"Arial","sans-serif"'><a href="www.eurac.edu">www.eurac.edu</a>
<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:8.0pt;font-family:"Arial","sans-serif"'>This transmission is
intended only for the use of the addressee and may contain confidential or
legally privileged information. <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt;text-autospace:none'><span
style='font-size:8.0pt;font-family:"Arial","sans-serif"'>If you receive this
transmission by error, please notify the author immediately by mail and delete
all copies of this transmission and any attachments. <o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='font-size:8.0pt;
font-family:"Arial","sans-serif"'>Any use or dissemination of this
communication is strictly prohibited by the "Privacy-Code", D.Lgs.
196/2003 and may conduct to penal prosecution and liability for damages.</span><span
style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><o:p> </o:p></p>
<p class=MsoNormal style='margin-left:36.0pt'><o:p> </o:p></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:36.0pt'><span style='color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal style='margin-left:72.0pt'><b><span style='font-size:10.0pt;
font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;
font-family:"Tahoma","sans-serif"'> corpora-bounces@uib.no
[mailto:corpora-bounces@uib.no] <b>On Behalf Of </b>Asim<br>
<b>Sent:</b> Tuesday, 27 May, 2008 19:41<br>
<b>To:</b> corpora@uib.no<br>
<b>Subject:</b> [Corpora-List] request for parsing and making the data in a
form tobe used by wordsmith<o:p></o:p></span></p>
<p class=MsoNormal style='margin-left:72.0pt'><o:p> </o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>Hello<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>I am working on Pakistani
English. I have compiled a 2.1 million word corpus of written Pakistani
English. It is the first ever corpus of Pakistani English .<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>I want to study the features of
Pakistani variety of English. Could any tell me how to locate them. Any
suggestion would be welcome.<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>I have tagged it and now trying
to analyse it using both top down and bottom up approaches.<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>I want to study the verb
particles and for this I want to parse the data as I think it is the only
possibility that I can get the confirmation that either it is a preposition or
particle. If there is any other way except manual study just tell me and I will
be obliged.<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'><o:p> </o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>Another issue is when I use
some online available demo parsers like LFG how to store the results to
be used with wordsmith 4 and use them to locate all the particles from my data
.<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>Is there any solution.<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>Wish to hear from you soon.<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>Regards<o:p></o:p></p>
<p class=MsoNormal style='margin-left:72.0pt'>Asim<o:p></o:p></p>
<p class=MsoNormal style='margin-left:36.0pt'><o:p> </o:p></p>
</div>
</body>
</html>