<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2733.1800" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><FONT size=3><FONT face="Times New Roman">Dear
colleagues <?xml:namespace prefix = o ns =
"urn:schemas-microsoft-com:office:office" /><o:p></o:p></FONT></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><BR><FONT face="Times New Roman" size=3>We are
pleased to announce the first release of the Lácio-Web webpage, aimed at
providing corpora for Brazilian Portuguese and software tools for computational
linguistic processing. <BR><BR>Six corpora will be available at the end of the
Lácio-Web Project in May, 2004. In this first release, two corpora are made
available: one version of Lácio-Ref for research and generation of subcorpora
and MAC-Morpho for download. For the download of the first public release,
please visit the webpage at <BR><BR></FONT><A
href="http://www.nilc.icmc.usp.br/lacioweb"><FONT face="Times New Roman"
size=3>http://www.nilc.icmc.usp.br/lacioweb</FONT></A><o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><BR><FONT size=3><FONT
face="Times New Roman">Further details of the 2 corpora being released are given
below. General information is given in the webpage
above:<o:p></o:p></FONT></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Times New Roman"
size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT size=3><FONT
face="Times New Roman"><SPAN lang=EN-US
style="mso-ansi-language: EN-US">Lácio-Ref <BR><BR>This version of the reference
corpus has 4,156,816 words, comprising texts from five genres (news, scientific,
prose, poetry and drama), several types of text (such as reports, papers,
chronicles, letters), various domains (such as education, engineering, politics)
and different media (magazines, Internet pages, books). Lácio-Ref is available
for research with generation of subcorpora for download in 2 formats: one with
headings in XML, with bibliographic data, and another with title, subtitles,
authorship and the plain text. <BR><BR>MAC-Morpho <BR><BR>MAC-Morpho has <SPAN
style="COLOR: red">1,167,183</SPAN> words from the newspaper Folha de
</SPAN><?xml:namespace prefix = st1 ns =
"urn:schemas-microsoft-com:office:smarttags" /><st1:City><st1:place><SPAN
lang=EN-US style="mso-ansi-language: EN-US">São
Paulo</SPAN></st1:place></st1:City><SPAN lang=EN-US
style="mso-ansi-language: EN-US">, 1994. It has been tagged with the Palavras
parser by Eckhard Bick (</SPAN></FONT></FONT><A href="http://visl.hum.sdu.dk"
target=_blank><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT
face="Times New Roman" size=3>http://visl.hum.sdu.dk</FONT></SPAN></A><SPAN
lang=EN-US style="mso-ansi-language: EN-US"><FONT size=3><FONT
face="Times New Roman">) and mapped to the tagset of the Lácio-Web project. The
morphosyntactic tags have been manually revised. MAC-MORPHO is available for
download in 2 formats: <BR><BR>1) for linguistic research with frequency
counters and concordancers, for example. <o:p></o:p></FONT></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><BR><FONT size=3><FONT
face="Times New Roman">2) for training taggers, as it allows the tagset to be
altered. For instance, some sub- specification of the tags has been removed and
multiword items were separated. These changes increased the size of the corpus
to <SPAN style="COLOR: red">1,221,468</SPAN> words. <BR
style="mso-special-character: line-break"><BR
style="mso-special-character: line-break"><o:p></o:p></FONT></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><FONT size=3><FONT
face="Times New Roman">Lácio-Web Project will also make available computational
linguistics tools. In this first release we have frequency counters and
concordancers in order to allow users to get a quick view of the subcorpora
generated. New tools, such as morphosyntactic taggers, will be made available in
the future. <o:p></o:p></FONT></FONT></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><BR><FONT face="Times New Roman"
size=3>Cordially,</FONT></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><o:p><FONT face="Times New Roman"
size=3> </FONT></o:p></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="mso-ansi-language: EN-US"><FONT size=3><FONT
face="Times New Roman">Lácio-Web
Team </FONT></FONT></SPAN></FONT></P></DIV></BODY></HTML>