[Lexicog] dictionary software

Koontz John E john.koontz at COLORADO.EDU
Mon Mar 22 22:47:15 UTC 2004


On Mon, 22 Mar 2004, David Frank wrote:
> I guess my next question is what you used to turn your Hopi dictionary
> database into a formatted document for printing.

I can't answer for Hopi, but the CSD - somewhat in abeyance at present -
uses or used various AWK scripts to do the lion's share of this the
reformatitng of the text files.  (AWK is a bit awk-ward for this task, but
only because it is a somewhat peculiar programming language.)  Today I'd
personally use Tcl or Python, and Perl would also do nicely.  These are
all available for Windows and so are various other scripting tools.  (In
the really old days I once used the Lisp-ish language built into EMACS and
most programmers' editors have something like that built into them.  I
really don't recommend these!)

However, none of this would have been much use if I hadn't had access to
several DOS-based tools - I don't remember the names - from
SIL/Phillipines that were able to convert a somewhat extended form of SFM
into a Word file.  Each field \xx was converted to a paragraph formatted
with the xx paragraph style and - this was the extension to SFM - each
segment of text coded with \xx{..\} or later \xx{..} was converted to a
segment of text formatted with character format xx.  That and a nice style
sheet of your choice (now called a formatting template, I think) and you
were home free.

Well, almost.  Two problems.  One, the dicitonary editors hated those
\xx{...} codes, because they were ugly and unnatural and took a lot
keystrokes and to enter, and because they prevented them from searching
the database.  One of those around or within a word made it a lot harder
to match.  I think they also just didn't like to think about even
structural markup while writing.  That and the ugly and unnatural I never
solved.  I did reduce the key strokes and the searching problems through
the expedient of letting them enter |xx to start a code (or end a
preceding one) and | to revert to no special coding.  Delineate these with
spaces and you're OK.  However, that won't work to make marks within words
invisible, only marks around them.  Fortunately, we had few cases of the
latter, though only by luck.  Anyway, with a little judicious scripting
you can convert the |xx and | notation to the |xx{...} notation, deleting
extra spaces as appropriate.

The other problem was that the SIL/Phil tools never liked files as long as
we had.  They ran into internal size limits and blew up in mid file.  So I
wrote tools to break the files into shorter pieces and then merged the
mini Word files by hand.

Without these SIL tools I'd have had to write my own tools to convert the
marked up text into RTF, which is a sort of generic formatting language
that Microsoft devised and accepts as input to Word.  Nowdays some form of
XML or XHTML might work, too.

Today I'd use SF Converter, but might still have to do some massaging of
the SFM before applying it.  I think I discussed those issues a while
back, so I'll omit them here.  They have to do with the relationship
between SFM (or comparable) fields and text paragraphs in various forms of
text derived from the database.  In general it's a many to one mapping
relation that depends on the formatting desired and the use the text is
put to.



------------------------ Yahoo! Groups Sponsor ---------------------~-->
Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
http://www.c1tracking.com/l.asp?cid=5511
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list