Corpora: training Brill's tagger with French

Keith J. Miller keith at mitre.org
Wed Mar 14 15:09:00 UTC 2001


To add support to Jean Véronis' posting, I recently had dealings with both
the INaLF folks and with the people at Synapse.

INaLF was more than happy to share the version of Brill's tagger trained for
French on the signing of a simple agreement.  And the Cordial POS tagger is
all that Jean says and more.  It's almost a pity to simply call it POS
tagger, because it also gives information about grammatical relationships,
functions, etc.  You can almost extract a parse from its output -- or most
likely the parts of a parse that you're interested in using in further
processing.  Also, Synapse was receptive to reports of minor difficulties
with the software, and requests for enhancements, some of which were
implemented right away.

Best of luck with your work.


                    -----  Keith J. Miller
                    keith at mitre.org
                    kjmiller at georgetown.edu


----- Original Message -----
From: Jean Veronis <Jean.Veronis at newsup.univ-mrs.fr>
To: Andre Linden <Andre.Linden at dimail.epfl.ch>; <CORPORA at HD.UIB.NO>
Sent: Wednesday, March 14, 2001 3:59 AM
Subject: Re: Corpora: training Brill's tagger with French


At 09:47 13/03/2001 +0100, Andre Linden wrote:
>Dear members,
>
>       We are currently working with Brill's tagger on French texts. We
>are facing the problem of training the tagger with accented texts and
>would like to know if anyone already has encountered this problem. We
>would very much appreciate any feedback on your own experience in this
>regard.

An adaptation to French has already been made and can be downloaded:

http://jupiter.inalf.cnrs.fr/WinBrill/winbrill.bienvenue.html

There is another tagger that more and more teams use in France, since it
performes well (probably the best tagger at the moment), and does not
require any training, hacking, etc. It is commercially distributed, but
very cheap for research (I think less then USD 100). It is called Cordial
Analyseur, and is developped by Synapse Development.

Contact:

http://www.synapse-fr.com/
Mr. Dominique LAURENT <dlaurent at synapse-fr.com>



Jean Véronis
http://www.up.univ-mrs.fr/~veronis/



More information about the Corpora mailing list