Secondary entries (was Re: [Lexicog] Query on how to deal with coined words)

Vincent `Bentong` S. Isles bentong.isles at GMAIL.COM
Thu Apr 12 08:36:51 UTC 2007


Hi Mike,

I had hoped to use CC to automate the production of what I call the
"spelling rationale" (\sr) field.

A single entry may have any of these fields: \et, \bw, \mr, but never
two or three of them.

The logic is this:
* If \et is present then it becomes \sr.
* If \mr is present and it is different from \lx after stripping \lx
of hyphens, then it becomes \sr.
* If \bw is present:
** If the first parameter is "en":
*** If \ge!= \lx then \sr = "en" + ge; else \sr = "en"
** else \sr = \bw

I hope you did get lost with that logic because I was not able to
translate it to CC statements. I guess I'm not very good with computers :(

The \sr field gets formatted as follows:

aba (native word: no \sr field)
abaka [Tag] (Tagalog word borrowed as is)
amatyur [Eng fi:amateur] (English word borrowed with change in sp)
kasub-anan [ka-SUBO-anan] (derived word with the root not so obvious)

This is for the "Cebuano Spelling Dictionary", a stripped-down version
of the "Modern Cebuano Dictionary".

Thanks for the offer on the parser, but I think I don't have the cash
for the xerox tools.

For now I've stopped work on that part of the project.

Thanks for the information. :)

--Bentong Isles


--- In lexicographylist at yahoogroups.com, Mike Maxwell <maxwell at ...> wrote:
>
> Vincent `Bentong` S. Isles wrote:
> > I would like to know that "complex procedure". I had spent half of
> > yesterday and the whole of today trying to understand the Consistent
> > Changes program, and I do very well think of CC when you wrote
> > "complex" :)
> 
> It might be useful to say what you're hoping to use cc for.  Cebuano 
> morphology is very complex, with reduplication and infixing, and if 
> you're trying to go from stems or inflected words to roots, I don't 
> think I would recommend cc.  What you really need is a morphological
parser.
> 
> It happens that I wrote a morphological parser for Cebuano several
years 
> ago.  I won't claim that it does everything right--I basically wrote it 
> in a couple hours, and tweaked it a little after that--it would
probably 
> work better than one could do with cc.  I did it while I was working
for 
> the Linguistic Data Consortium (LDC), and I'll have to ask them whether 
> it's sharable.  It uses the Xerox finite state tools, for which you
have 
> to pay $40 (the CD comes in a book from U of Chicago press).  If that's 
> of interest, let me know and I'll see what the LDC says.
> 
> On the other hand, if you're trying to fix spelling errors, a parser 
> won't help you fix them (it might help you find them).  A program like 
> cc might be usable to fix some classes of errors, provided you have
some 
> notion of what common errors are (substituting a 'c' for a 'k', for 
> example).
> 
> Also, there are other programs that do more or less the same thing that 
> cc does.  These come largely from the Unix/ Linux world, but are 
> available on DOS and in the Windows command prompt.  These are programs 
> like sed and awk (and more recently, Unicode-compatible versions of
such 
> programs, often written in Perl).  Depending on where you are (at a 
> university, for example), you may be able to find people who can help 
> you with these programs more easily than with cc.
> -- 
> 	Mike Maxwell
> 	maxwell at ...
>




 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/lexicographylist/join
    (Yahoo! ID required)

<*> To change settings via email:
    mailto:lexicographylist-digest at yahoogroups.com 
    mailto:lexicographylist-fullfeatured at yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list