[Lexicog] Spellchecking Unicode in MS-Office

Jan F. Ullrich jfu at CENTRUM.CZ
Fri Nov 14 18:48:47 UTC 2008


Dear Hayim,

 

Changing a character in a standardized orthography is not an option.

 

Jan

 

From: lexicographylist at yahoogroups.com [mailto:lexicographylist at yahoogroups.com] On Behalf Of Hayim Sheynin
Sent: Friday, November 14, 2008 5:57 PM
To: lexicographylist at yahoogroups.com
Subject: Re: [Lexicog] Spellchecking Unicode in MS-Office

 

Dear Jan,

Would you consider changing the graphem to one of the following

Ĥ Latin capital letter H with circumflex (UnicodeLatin Extended-A 0124
ĥ Latin small letter h with circumflex (Unicode Latin Extended-A 0125

or

Ħ Latin capital letter H with stroke (UnicodeLatin Extended-A 0126)
ħ Latin small letter h with stroke (Unicode Latin Extended-A 0127)

If you didn't use these characters I think it worth to try them.

Best luck,

Hayim Sheynin

On Fri, Nov 14, 2008 at 6:05 AM, Jan F. Ullrich <jfu at centrum.cz <mailto:jfu%40centrum.cz> > wrote:
>
>
> Dear lexicographers,
>
>
>
> I wonder if someone here could advice about the following problem. Years
> ago, before the full development of Unicode and Unicode fonts we created our
> own fonts for the Lakota language. In those fonts the needed characters were
> set to codes of characters not present in the language. For instance we used
> umlaut vowels. This solution was imperfect but beside other things it
> allowed us to take advantage of Microsoft Word spellchecking function. We
> simply created a list words and word forms and inserted them into a custom
> dictionary for MS-Word. Of course, this was a simplistic spellchecker, one
> that could not cover all the word forms of the language, but at that time it
> actually represented a helpful tool for our students and language teachers.
>
>
>
> A few years ago we transferred all of our textual materials into Unicode and
> we also programmed a powerful morphoparser and lemmatizer that help us
> create a quite comprehensive list of word forms. But we are having a problem
> using this list for spell-checking in MS-Office, because one of the Unicode
> characters that we use is not recognized by MS-Word. It is the Latin letter
> h with caron (U+021F)
> (http://www.fileformat.info/info/unicode/char/021f/index.htm). MS-Word won't
> consider this letter a part of the word no matter what we do and this
> disables using the spellchecking functions in that editor. It causes
> problems in other ways too, for instance in searching for words within a
> document, in various formatting operations etc.
>
> We found out that the character is recognized when we associate the text
> with a different language for spellchecking, for instance French, but then
> other characters are not recognized. If we keep the text assigned to English
> spellchecking (which is desired) then it is only h-caron that is not
> recognized.
>
>
>
> I do not know enough about Unicode to figure out how to solve this. For
> instance I don't know if the character recognition within MS-Word a feature
> of MS-Word or of Unicode. Couple years back we contacted Microsoft about
> this but we received no response.
>
> I am aware that there are other options for spell-checking a text, but since
> the MS-Word is such a main-stream editor used in most schools and colleges
> where the Lakota language is taught and used, it would be really nice if we
> could make the spellchecking function work in it.
>
>
>
> We would really appreciate any advice on how solve this?
>
>
>
>
>
> Jan
>
>
>
>
>
> Jan F. Ullrich, Linguistic Director
>
> Lakota Language Consortium
>
> www.lakhota.org
>
> e-mail: jfu at lakhota.org <mailto:jfu%40lakhota.org> 
>
> Skype: janfull
>
>
>
>
>
> 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20081114/9252db92/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 353 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20081114/9252db92/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 332 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20081114/9252db92/attachment-0001.jpg>


More information about the Lexicography mailing list