[Lexicog] Dictionary Output

Benjamin Barrett gogaku at IX.NETCOM.COM
Sat Jun 12 21:22:06 UTC 2004


Thanks for the suggestion, Neal. Unfortunately, that option doesn't work
with two-byte languages (such as Korean). The only export that works is SF
format.
 
I did a test on a single-byte language (the sample Haida data) and got the
output in RTF, but didn't see any markers that could be used to format
lines. (Just out of curiosity, is there a way of doing that?)
 
That got me thinking of doing a simple algorithm in Word, though. I exported
using the SF format, opened in Notepad in UTF-8 and saved, then opened in
Word.
 
I tried changing the format of all lx lines to a bold 16 font as a test.
 
The first thing I tried was to find a string like \lx *^p.The ^p means
paragraph mark and the asterisk means anything when the Use wildcards option
is checked. However, the slash symbol is also a wildcard, and you're not
allowed to use ^p when the Use wildcard options box is checked.
 
So I did a global replace on ^p to a nonsense string that included the
paragraph marker such as $ss$^p. So now there's a $ss$ at the end of each
line.
 
Then, I used a wildcard replace. I used lx (*)$ss$ for the find and \1 for
the replace and used the format tab to indicate the replacement format. (The
\1 says use the first word in parentheses in the find box. In my example,
the first word is an asterisk which stands for anything.)
 
That did the trick!
 
So now, I'm thinking of abandoning the Perl approach and just using Word
with some fancy find and replace work or even Visual Basic.
 
Benjamin Barrett

-----Original Message-----
From: Neal_Brinneman at sil.org [mailto:Neal_Brinneman at sil.org] 


Dear Benjamin,
Why reinvent the wheel? Shoebox will output to RTF if you use the suggested
MDF standard format markers for the fields. It also sorts your language
according to the sort order you establish when you define the sort order in
the language. It will also do a dictionary reversal..
Neal


                                                                           
             "Benjamin                                                     
             Barrett"                                                      
             <gogaku at ix.netcom                                          To 
             .com>                     <lexicographylist at yahoogroups.com>  
                                                                        cc 
             10/06/2004 16:06                                              
                                                                   Subject 
            

A friend sent me a link to an article on using Perl to output Shoebox files
to a formatted rtf text. It describes all kinds of issues such as sorting
with accented letters, boldfacing, and two headwords that look identical.
It's at http://www.perl.com/pub/a/2004/03/25/dictionaries.html

Benjamin Barrett

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20040612/78e528ae/attachment.htm>


More information about the Lexicography mailing list