Secondary entries (was Re: [Lexicog] Query on how to deal with coined words)
Norbert Rennert
norbert.rennert at SIL.ORG
Thu Apr 12 22:19:16 UTC 2007
Vincent,
This is an interesting challenge. If you send me a couple of records
which contain all the possible combinations of fields (or at least most
of them), I'll take a stab at it.
Norbert Rennert
Vincent `Bentong` S. Isles wrote:
>
> Hi Mike,
>
> I had hoped to use CC to automate the production of what I call the
> "spelling rationale" (\sr) field.
>
> A single entry may have any of these fields: \et, \bw, \mr, but never
> two or three of them.
>
> The logic is this:
> * If \et is present then it becomes \sr.
> * If \mr is present and it is different from \lx after stripping \lx
> of hyphens, then it becomes \sr.
> * If \bw is present:
> ** If the first parameter is "en":
> *** If \ge!= \lx then \sr = "en" + ge; else \sr = "en"
> ** else \sr = \bw
>
> I hope you did get lost with that logic because I was not able to
> translate it to CC statements. I guess I'm not very good with computers :(
>
> The \sr field gets formatted as follows:
>
> aba (native word: no \sr field)
> abaka [Tag] (Tagalog word borrowed as is)
> amatyur [Eng fi:amateur] (English word borrowed with change in sp)
> kasub-anan [ka-SUBO-anan] (derived word with the root not so obvious)
>
> This is for the "Cebuano Spelling Dictionary", a stripped-down version
> of the "Modern Cebuano Dictionary".
>
> Thanks for the offer on the parser, but I think I don't have the cash
> for the xerox tools.
>
> For now I've stopped work on that part of the project.
>
> Thanks for the information. :)
>
> --Bentong Isles
>
> --- In lexicographylist at yahoogroups.com
> <mailto:lexicographylist%40yahoogroups.com>, Mike Maxwell
> <maxwell at ...> wrote:
> >
> > Vincent `Bentong` S. Isles wrote:
> > > I would like to know that "complex procedure". I had spent half of
> > > yesterday and the whole of today trying to understand the Consistent
> > > Changes program, and I do very well think of CC when you wrote
> > > "complex" :)
> >
> > It might be useful to say what you're hoping to use cc for. Cebuano
> > morphology is very complex, with reduplication and infixing, and if
> > you're trying to go from stems or inflected words to roots, I don't
> > think I would recommend cc. What you really need is a morphological
> parser.
> >
> > It happens that I wrote a morphological parser for Cebuano several
> years
> > ago. I won't claim that it does everything right--I basically wrote it
> > in a couple hours, and tweaked it a little after that--it would
> probably
> > work better than one could do with cc. I did it while I was working
> for
> > the Linguistic Data Consortium (LDC), and I'll have to ask them whether
> > it's sharable. It uses the Xerox finite state tools, for which you
> have
> > to pay $40 (the CD comes in a book from U of Chicago press). If that's
> > of interest, let me know and I'll see what the LDC says.
> >
> > On the other hand, if you're trying to fix spelling errors, a parser
> > won't help you fix them (it might help you find them). A program like
> > cc might be usable to fix some classes of errors, provided you have
> some
> > notion of what common errors are (substituting a 'c' for a 'k', for
> > example).
> >
> > Also, there are other programs that do more or less the same thing that
> > cc does. These come largely from the Unix/ Linux world, but are
> > available on DOS and in the Windows command prompt. These are programs
> > like sed and awk (and more recently, Unicode-compatible versions of
> such
> > programs, often written in Perl). Depending on where you are (at a
> > university, for example), you may be able to find people who can help
> > you with these programs more easily than with cc.
> > --
> > Mike Maxwell
> > maxwell at ...
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20070412/2ed9541e/attachment.htm>
More information about the Lexicography
mailing list