<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1"><!-- Network content -->
<META content="MSHTML 6.00.2900.2963" name=GENERATOR></HEAD>
<BODY style="BACKGROUND-COLOR: #ffffff" bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Thanks, Mike. I just got your message. I'm back
home now. I wound up downloading three different compare-files programs. I found
one called WinMerge easy to work with and easy on my (aging) eyes. I was able to
spot the salient differences between my files easily with WinMerge and also
merge what was in the older file into the newer one easily with the program. The
program can run on several platforms and can be downloaded from:</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2><A
href="http://winmerge.org/downloads.php">http://winmerge.org/downloads.php</A></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>It can check by different parameters. I think the
default is a line-by-line check which worked fine for me.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Wayne</FONT></DIV>
<DIV><FONT face=Arial size=2>-----<BR>Wayne Leman<BR>Cheyenne dictionary
online:<BR><A
href="http://www11.asphost4free.com/cheyennedictionary/default.htm">http://www11.asphost4free.com/cheyennedictionary/default.htm</A></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<BLOCKQUOTE
style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV><BR></DIV>
<DIV id=ygrp-text>
<P>Wayne Leman wrote:<BR>> Is there to automate a process of merging my
current database with the <BR>> older, larger one and end up with a new
database which has all and only the <BR>> database records I want?
...<BR>> <BR>> I am willing to do a visual check using some
compare-files software, if I <BR>> can compare differences between two
Shoebox databases. But the more <BR>> automated the process can be, without
losing data, the better.<BR><BR>One of the fundamental problems with SFM files
is that the record <BR>separator is the same character (CR-LF) as the field
separator. <BR>Typically there are at two CR-LFs between records (i.e. a blank
line), <BR>but that doesn't help much, nor does the fact that one of the field
SFMs <BR>is defined as a record separator.<BR><BR>If it were me, here's what
I'd do:<BR><BR>1) Ensure that your files do not contain any tab chars, and
convert all <BR>sequences of multiple tab and/or space chars to a single space
char. A <BR>unix command to do this:<BR>tr -s " \t" " "<BR>That's a space char
before the \t, and a single space char between the <BR>second pair of quotes.
The purpose of this step is to ensure that you <BR>don't accidentally have any
tab chars (which will mess up a later step, <BR>since we're going to use tab
chars to separate fields), and that the two <BR>files you're going to merge
don't have fields that differ trivially by <BR>the amount of whitespace they
contain. Of course, if you use two space <BR>chars after a period, you might
not want to do this...<BR><BR>2) Convert all CR-LF characters in each file to
a (single) tab char. <BR>One way to do this would be to use the unix 'tr'
utility:<BR>tr -s "\r\n" "\t"<BR>(The '-s' option squeezes multiple
occurrences of your output tab char <BR>to a single tab.) At this point, your
files each consist of a single <BR>line, with a single tab char before each
SFM.<BR><BR>3) Convert the tab char separating records (but not fields) into a
<BR>newline (LF in Unix, which is what I'd be using :-), or CR-LF). One way
<BR>to do this is to use the unix 'sed' utility:<BR>sed -e "s/\t\\lx /\n\\lx
/g"<BR>(The trailing 'g' means do this multiple times; by default, sed only
<BR>does the operation once per line. I'm assuming your record-delineating
<BR>SFM is \lx, modify as necessary.) At this point, each file consists of
<BR>a series of records separated by a single newline, and fields within
<BR>records separated by a single tab char.<BR><BR>4) Pass both files together
through a sorter, and have it eliminate <BR>duplicates. The unix way to do
this is<BR>sort -u<BR>(The -u parameter means "eliminate duplicates".<WBR>) At
this point, you <BR>have a single file consisting of non-duplicate records,
sorted <BR>alphabetically, with a single newline separating records and a
single <BR>tab separating fields.<BR><BR>5) Convert the single Unix newline to
a sequence of two DOS newlines:<BR>sed -e "s/\n/\r\n\r\<WBR>n"<BR><BR>6)
Convert the tab chars to a single newline:<BR>sed -e "s/\t/\n/g"<BR><BR>At
step 3, you could diff the two files to see if you have any nearly
<BR>identical records. Most diff programs will only tell if two lines
<BR>differ; some will tell how they differ, i.e. if there are minor changes.
<BR>The visual diff program that comes with ComponentSoftware'<WBR>s RCS
<BR>program does this (although with the long lines you're likely to have at
<BR>step 3, such diff programs might be DIFFicult to use; guess you could do
<BR>step 6 to put the records into temp files first...). While you're at
<BR>it, you might want to use RCS to track changes.<BR><BR>Steps 1-3 and 4-6
can each be combined into single operations using <BR>"pipes", avoiding some
of the intermediate files:<BR><BR>cat OldFile1.sfm | tr -s " \t" " " | tr -s
"\r\n" "\t" | sed -e <BR>"s/\t\\lx /\n\\lx /g" >
/tmp/OldFile1.<WBR>sfm<BR><BR>cat OldFile2.sfm | tr -s " \t" " " | tr -s
"\r\n" "\t" | sed -e <BR>"s/\t\\lx /\n\\lx /g" >
/tmp/OldFile2.<WBR>sfm<BR><BR>cat OldFile1.sfm OldFile2.sfm | sort -u | sed -e
"s/\n/\r\n\r\<WBR>n" | <BR>sed -e "s/\t/\n/g" > NewFile.sfm<BR><BR>All this
presumes that you either have access to a Unix (Linux) machine, <BR>or (more
likely) that you use s.t. like the CygWin Unix utilities (far <BR>superior to
the Windows command prompt, IMHO).<BR><BR>Disclaimer: I haven't tested the
above, there might be mistakes.<BR><BR>Oops, one other thing I would do, call
it step 3 1/2: get rid of any <BR>space chars before tab chars. These would
correspond to space chars at <BR>the end of a line. They're not really a
problem, except that they could <BR>give you spurious non-identical records
(if you accidentally put such <BR>space chars in one file but not the other).
Or maybe Shoebox enforces <BR>this when it saves
files?<BR><BR>Links:<BR><BR><A
href="http://www.ComponentSoftware.com/">http://www.Componen<WBR>tSoftware.<WBR>com/</A>
(you can use the freeware version)<BR><A
href="http://cygwin.com/">http://cygwin.<WBR>com/</A><BR>-- <BR>Mike
Maxwell<BR><A
href="mailto:maxwell%40ldc.upenn.edu">maxwell@ldc.<WBR>upenn.edu</A><BR></P></DIV><!--End group email --></BLOCKQUOTE>
<span width="1" style="color: white;"/>__._,_.___</span>
<!-- |**|begin egp html banner|**| -->
<img src="http://geo.yahoo.com/serv?s=97476590/grpId=11682781/grpspId=1604195911/msgId=3416/stime=1159077908" width="1" height="1"> <br>
<!-- |**|end egp html banner|**| -->
<!-- |**|begin egp html banner|**| -->
<br><br>
<div style="width:500px; text-align:right; margin-bottom:1px; color:#909090;">
<tt>SPONSORED LINKS</tt>
</div>
<table bgcolor=#e0ecee cellspacing="13" cellpadding="0" width=500px>
<tr valign=top>
<td style="width:25%;">
<tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkOWJyMThrBF9TAzk3NDc2NTkwBF9wAzEEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Science+lab+equipment&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=BbJi6CErds7KZnId7Tx8fA">Science lab equipment</a></tt>
</td>
<td style="width:25%;">
<tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkcjgzaWtpBF9TAzk3NDc2NTkwBF9wAzIEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Life+science+research&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=vVj7M6vkryjVlUH2o8EwOw">Life science research</a></tt>
</td>
<td style="width:25%;">
<tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkOHM4c2pnBF9TAzk3NDc2NTkwBF9wAzMEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Life+sciences&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=9KX1xOn2FPCeutkaD5vX2Q">Life sciences</a></tt>
</td>
</tr>
<tr valign=top>
<td style="width:25%;">
<tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkMmdqMjhrBF9TAzk3NDc2NTkwBF9wAzQEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Cognitive+science&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=nrYCTyAWl2rPQqQCE5yULw">Cognitive science</a></tt>
</td>
</tr>
</table>
<!-- |**|end egp html banner|**| -->
<!-- |**|begin egp html banner|**| -->
<br>
<div style="font-family: verdana; font-size: 77%; border-top: 1px solid #666; padding: 5px 0;" >
Your email settings: Individual Email|Traditional <br>
<a href="http://groups.yahoo.com/group/lexicographylist/join;_ylc=X3oDMTJnZnQ3YzA5BF9TAzk3MzU5NzE0BGdycElkAzExNjgyNzgxBGdycHNwSWQDMTYwNDE5NTkxMQRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMTU5MDc3OTA4">Change settings via the Web</a> (Yahoo! ID required) <br>
Change settings via email: <a href="mailto:lexicographylist-digest@yahoogroups.com?subject=Email Delivery: Digest">Switch delivery to Daily Digest</a> | <a href = "mailto:lexicographylist-fullfeatured@yahoogroups.com?subject=Change Delivery Format: Fully Featured">Switch to Fully Featured</a> <br>
<a href="http://groups.yahoo.com/group/lexicographylist;_ylc=X3oDMTJlOXEwbHRhBF9TAzk3MzU5NzE0BGdycElkAzExNjgyNzgxBGdycHNwSWQDMTYwNDE5NTkxMQRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTE1OTA3NzkwOA--">
Visit Your Group
</a> |
<a href="http://docs.yahoo.com/info/terms/">
Yahoo! Groups Terms of Use
</a> |
<a href="mailto:lexicographylist-unsubscribe@yahoogroups.com?subject=Unsubscribe">
Unsubscribe
</a>
<br>
</div>
<br>
<!-- |**|end egp html banner|**| -->
<span style="color: white;"/>__,_._,___</span>
</BODY></HTML>