[Corpora-List] (no subject)
Mike Maxwell
maxwell at umiacs.umd.edu
Thu Jan 17 00:58:06 UTC 2013
On 1/15/2013 12:28 PM, Eirini LS wrote:
> I was a bit confused, when a person who has created an analyzer
> (using xerox calculus, lexc) argued that the module works only
>for analysis, but doesn't generates anything and nobody can use
> it in other direction (using lookup, recall). It is not right to
> read a list that it generates using a command "print lower-words".
>Is it right? How can I check the quality of an analyzer?
Since no one has responded to this, I'll try.
The Xerox Finite State Tools (both lexc and xfst) are inherently bidirectional; if you can analyze
words, you can also generate from whatever underlying representation the writer of the parser code
has chosen. That is, if 'cats' analyzes as 'cat+PL', then you can input 'cat+PL' in generate mode,
and it will give you 'cats'.
What the person you talked to may have been referring to is the fact that (if I'm remembering
correctly) the standard version of lexc (and xfst) places a limit on how many "words" it will print
with "print words" (I wasn't thinking there was a limit on print-lower-words, but I may be wrong).
As I understand it, this has to do with the fact that Xerox was trying to protect its investment in
the code that produced upper/lower pairs from a lexicon plus rules--otherwise, you could compile a
transducer using lexc and/or xfst, dump the upper/lower pairs, and input those pairs into some
simple-to-build and unlicensed FST which had no compilation capability. There was a commercial
version of the tools which cost considerably more, and which could be used to build commercial and
distributable FSTs. But I am not a lawyer, and my memory of that is fuzzy. If you need more
information, you should contact Lauri Karttunen and Ken Beesley, who wrote the book on xfst and lexc
(literally and figuratively).
Also, there is now an open source tool, foma, which does most of what xfst did, with the exception
of compile-replace (used for some kinds of reduplication); but I believe foma has a work-around for
this. The compile-replace algorithm was patented.
Checking the quality of a morph analyzer like xfst/lexc (or any other such tools) is a different
question. There are lots of ways to do it; one we used was to run test cases (words to be parsed)
through xfst and hand-validate the output. The input/output pairs were stored in a version control
system, so as to allow regression testing. There are other ways as well.
For the record, I would not use "print lower-words" for testing the parser, since that doesn't tell
you whether you get the *correct* analysis.
--
Mike Maxwell
maxwell at umiacs.umd.edu
"My definition of an interesting universe is
one that has the capacity to study itself."
--Stephen Eastmond
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list