<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1106" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US">Dear colleagues <?xml:namespace prefix = o ns =
"urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><BR>We are pleased to announce the
second release of the Lácio-Web webpage. Lácio-Web is a project aimed at
providing corpora for Brazilian Portuguese and software tools for computational
linguistic processing. <o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><o:p> </o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">As
a result of the first release, launched in January 20<SUP>th</SUP>, two corpora
were made available: <o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt 18pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">-
a version of the Lácio-Ref (a reference corpus with </SPAN><SPAN lang=EN-US
style="mso-ansi-language: EN-US">4,156,816 words</SPAN><SPAN lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">) constituted
of five genres of texts (informative, scientific, prose, poetry and drama), for
research and building of subcorpora, and <o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt 18pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">-
the MAC-MORPHO, a POS annotated corpus with </SPAN><SPAN lang=EN-US
style="mso-ansi-language: EN-US">1,167,183 words, from the newspaper Folha de
</SPAN><?xml:namespace prefix = st1 ns =
"urn:schemas-microsoft-com:office:smarttags" /><st1:City><st1:place><SPAN
lang=EN-US style="mso-ansi-language: EN-US">São
Paulo</SPAN></st1:place></st1:City><SPAN lang=EN-US
style="mso-ansi-language: EN-US">, 1994</SPAN><SPAN lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">.
<o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US"><o:p> </o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US">For the second release, Lácio-Ref
has been enhanced with texts from the following genres: legal, scientific,
informative and instructional. The Lácio-Ref Corpus consists of 4,278 files with
8,291,818 words at the time of its second release.<o:p></o:p></SPAN></P>
<P class=MsoBodyTextIndent style="MARGIN: auto 0cm; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">A
parallel corpus Par-C has also been made available with 646 text files in
English and 646 in Portuguese from the Revista Pesquisa Fapesp. The total number
of words in the parallel corpus is 893,283.</SPAN><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">Apart from
these corpora, a tool to build English-Portuguese comparable corpora for the
legal genre has also been made available. For that purpose, a reference corpus
with English texts (Ref-Ig) has been compiled for that domain. It contains 29
texts with a total of 61,149 words, and will be enlarged in the
future.</SPAN><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><o:p></o:p></SPAN></P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-INDENT: 42pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US"> </SPAN><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">All in all,
Lácio-Web contains 5,708 files with a total of 10,413,524 words.</SPAN><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US"><o:p> </o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">The project
also makes available several computational linguistic tools such as </SPAN><SPAN
lang=EN-US style="mso-ansi-language: EN-US">frequency counters, concordancers
and three POS taggers trained with the MAC-Morpho corpus: MXPOST, </SPAN><SPAN
class=grame><SPAN lang=EN-US
style="FONT-FAMILY: 'Palatino Linotype'; mso-ansi-language: EN-US">TreeTagger</SPAN></SPAN><SPAN
lang=EN-US style="mso-ansi-language: EN-US"> and Brill TBL.
<o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><o:p> </o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US">These new facilities are available
from the project webpage:<o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><BR><A
href="http://www.nilc.icmc.usp.br/lacioweb">http://www.nilc.icmc.usp.br/lacioweb</A><BR
style="mso-special-character: line-break"><BR
style="mso-special-character: line-break"><o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><o:p> </o:p></SPAN></P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify">Cordially,</P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><o:p> </o:p></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><SPAN
lang=EN-US style="mso-ansi-language: EN-US">Lácio-Web Team </SPAN></P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><o:p> </o:p></P></DIV></BODY></HTML>