encoding problem in XLE

Ozlem Cetinoglu ozlemc at su.sabanciuniv.edu
Tue May 24 14:39:31 UTC 2005


I'm a Ph.D student in Sabanci University, working on Turkish LFG. I use
XLE on Ubuntu Linux, which supports ISO8859-9. In order to input strings
including Turkish-specific characters I set every possible encoding to
ISO8859-9 by using "set-character-encoding" command in Tcl shell. All
the relevant files e.g. the grammar, the lexicon, the sublexical rules
are in ISO8859-9 format. Also, our morphological analyzer accepts
ISO8859-9 inputs and for tokenizing I use "english.tok.parse.fst" from
the English LFG package of NLTT. The system work very well with strings
with no Turkish-specific characters but gives the following error
message when I try to parse a word with a Turkish character:

unIcode is not a valid Tcl character encoding.

The line is repeated several times and the morphology window displays
the Turkish specific character in another symbol.

Is that because I use the English tokenizer and if so is there any ways
to fix the problem without writing a Turkish tokenizer? Or what else can
the problem be?

Thanks in advance,
Ozlem Cetinoglu



More information about the LFG mailing list