[Lexicog] Spellchecking Unicode in MS-Office

Jan F. Ullrich jfu at CENTRUM.CZ
Fri Nov 14 11:05:13 UTC 2008


 

Dear lexicographers,

 

I wonder if someone here could advice about the following problem. Years
ago, before the full development of Unicode and Unicode fonts we created our
own fonts for the Lakota language. In those fonts the needed characters were
set to codes of characters not present in the language. For instance we used
umlaut vowels. This solution was imperfect but beside other things it
allowed us to take advantage of Microsoft Word spellchecking function. We
simply created a list words and word forms and inserted them into a custom
dictionary for MS-Word. Of course, this was a simplistic spellchecker, one
that could not cover all the word forms of the language, but at that time it
actually represented a helpful tool for our students and language teachers.

 

A few years ago we transferred all of our textual materials into Unicode and
we also programmed a powerful morphoparser and lemmatizer that help us
create a quite comprehensive list of word forms. But we are having a problem
using this list for spell-checking in MS-Office, because one of the Unicode
characters that we use is not recognized by MS-Word. It is the Latin letter
h with caron (U+021F)
(http://www.fileformat.info/info/unicode/char/021f/index.htm). MS-Word won't
consider this letter a part of the word no matter what we do and this
disables using the spellchecking functions in that editor. It causes
problems in other ways too, for instance in searching for words within a
document, in various formatting operations etc. 

We found out that the character is recognized when we associate the text
with a different language for spellchecking, for instance French, but then
other characters are not recognized. If we keep the text assigned to English
spellchecking (which is desired) then it is only h-caron that is not
recognized.

 

I do not know enough about Unicode to figure out how to solve this. For
instance I don't know if the character recognition within MS-Word a feature
of MS-Word or of Unicode. Couple years back we contacted Microsoft about
this but we received no response.

I am aware that there are other options for spell-checking a text, but since
the MS-Word is such a main-stream editor used in most schools and colleges
where the Lakota language is taught and used, it would be really nice if we
could make the spellchecking function work in it.

 

We would really appreciate any advice on how solve this?

 

 

Jan

 

 

Jan F. Ullrich, Linguistic Director

Lakota Language Consortium

www.lakhota.org <http://www.lakhota.org/> 

e-mail: jfu at lakhota.org

Skype: janfull

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20081114/1a5a2881/attachment.htm>


More information about the Lexicography mailing list