<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1"><!-- Network content -->

<META content="MSHTML 6.00.2900.2963" name=GENERATOR></HEAD>

<BODY style="BACKGROUND-COLOR: #ffffff" bgColor=#ffffff>


<DIV><FONT face=Arial size=2>Thanks, Mike. I just got your message. I'm back 

home now. I wound up downloading three different compare-files programs. I found 

one called WinMerge easy to work with and easy on my (aging) eyes. I was able to 

spot the salient differences between my files easily with WinMerge and also 

merge what was in the older file into the newer one easily with the program. The 

program can run on several platforms and can be downloaded from:</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2><A 

href="http://winmerge.org/downloads.php">http://winmerge.org/downloads.php</A></FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>It can check by different parameters. I think the 

default is a line-by-line check which worked fine for me.</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>Wayne</FONT></DIV>

<DIV><FONT face=Arial size=2>-----<BR>Wayne Leman<BR>Cheyenne dictionary 

online:<BR><A 

href="http://www11.asphost4free.com/cheyennedictionary/default.htm">http://www11.asphost4free.com/cheyennedictionary/default.htm</A></FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<BLOCKQUOTE 

style="PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">

  <DIV><BR></DIV>

  <DIV id=ygrp-text>

  <P>Wayne Leman wrote:<BR>> Is there to automate a process of merging my 

  current database with the <BR>> older, larger one and end up with a new 

  database which has all and only the <BR>> database records I want? 

  ...<BR>> <BR>> I am willing to do a visual check using some 

  compare-files software, if I <BR>> can compare differences between two 

  Shoebox databases. But the more <BR>> automated the process can be, without 

  losing data, the better.<BR><BR>One of the fundamental problems with SFM files 

  is that the record <BR>separator is the same character (CR-LF) as the field 

  separator. <BR>Typically there are at two CR-LFs between records (i.e. a blank 

  line), <BR>but that doesn't help much, nor does the fact that one of the field 

  SFMs <BR>is defined as a record separator.<BR><BR>If it were me, here's what 

  I'd do:<BR><BR>1) Ensure that your files do not contain any tab chars, and 

  convert all <BR>sequences of multiple tab and/or space chars to a single space 

  char. A <BR>unix command to do this:<BR>tr -s " \t" " "<BR>That's a space char 

  before the \t, and a single space char between the <BR>second pair of quotes. 

  The purpose of this step is to ensure that you <BR>don't accidentally have any 

  tab chars (which will mess up a later step, <BR>since we're going to use tab 

  chars to separate fields), and that the two <BR>files you're going to merge 

  don't have fields that differ trivially by <BR>the amount of whitespace they 

  contain. Of course, if you use two space <BR>chars after a period, you might 

  not want to do this...<BR><BR>2) Convert all CR-LF characters in each file to 

  a (single) tab char. <BR>One way to do this would be to use the unix 'tr' 

  utility:<BR>tr -s "\r\n" "\t"<BR>(The '-s' option squeezes multiple 

  occurrences of your output tab char <BR>to a single tab.) At this point, your 

  files each consist of a single <BR>line, with a single tab char before each 

  SFM.<BR><BR>3) Convert the tab char separating records (but not fields) into a 

  <BR>newline (LF in Unix, which is what I'd be using :-), or CR-LF). One way 

  <BR>to do this is to use the unix 'sed' utility:<BR>sed -e "s/\t\\lx /\n\\lx 

  /g"<BR>(The trailing 'g' means do this multiple times; by default, sed only 

  <BR>does the operation once per line. I'm assuming your record-delineating 

  <BR>SFM is \lx, modify as necessary.) At this point, each file consists of 

  <BR>a series of records separated by a single newline, and fields within 

  <BR>records separated by a single tab char.<BR><BR>4) Pass both files together 

  through a sorter, and have it eliminate <BR>duplicates. The unix way to do 

  this is<BR>sort -u<BR>(The -u parameter means "eliminate duplicates".<WBR>) At 

  this point, you <BR>have a single file consisting of non-duplicate records, 

  sorted <BR>alphabetically, with a single newline separating records and a 

  single <BR>tab separating fields.<BR><BR>5) Convert the single Unix newline to 

  a sequence of two DOS newlines:<BR>sed -e "s/\n/\r\n\r\<WBR>n"<BR><BR>6) 

  Convert the tab chars to a single newline:<BR>sed -e "s/\t/\n/g"<BR><BR>At 

  step 3, you could diff the two files to see if you have any nearly 

  <BR>identical records. Most diff programs will only tell if two lines 

  <BR>differ; some will tell how they differ, i.e. if there are minor changes. 

  <BR>The visual diff program that comes with ComponentSoftware'<WBR>s RCS 

  <BR>program does this (although with the long lines you're likely to have at 

  <BR>step 3, such diff programs might be DIFFicult to use; guess you could do 

  <BR>step 6 to put the records into temp files first...). While you're at 

  <BR>it, you might want to use RCS to track changes.<BR><BR>Steps 1-3 and 4-6 

  can each be combined into single operations using <BR>"pipes", avoiding some 

  of the intermediate files:<BR><BR>cat OldFile1.sfm | tr -s " \t" " " | tr -s 

  "\r\n" "\t" | sed -e <BR>"s/\t\\lx /\n\\lx /g" > 

  /tmp/OldFile1.<WBR>sfm<BR><BR>cat OldFile2.sfm | tr -s " \t" " " | tr -s 

  "\r\n" "\t" | sed -e <BR>"s/\t\\lx /\n\\lx /g" > 

  /tmp/OldFile2.<WBR>sfm<BR><BR>cat OldFile1.sfm OldFile2.sfm | sort -u | sed -e 

  "s/\n/\r\n\r\<WBR>n" | <BR>sed -e "s/\t/\n/g" > NewFile.sfm<BR><BR>All this 

  presumes that you either have access to a Unix (Linux) machine, <BR>or (more 

  likely) that you use s.t. like the CygWin Unix utilities (far <BR>superior to 

  the Windows command prompt, IMHO).<BR><BR>Disclaimer: I haven't tested the 

  above, there might be mistakes.<BR><BR>Oops, one other thing I would do, call 

  it step 3 1/2: get rid of any <BR>space chars before tab chars. These would 

  correspond to space chars at <BR>the end of a line. They're not really a 

  problem, except that they could <BR>give you spurious non-identical records 

  (if you accidentally put such <BR>space chars in one file but not the other). 

  Or maybe Shoebox enforces <BR>this when it saves 

  files?<BR><BR>Links:<BR><BR><A 

  href="http://www.ComponentSoftware.com/">http://www.Componen<WBR>tSoftware.<WBR>com/</A> 

  (you can use the freeware version)<BR><A 

  href="http://cygwin.com/">http://cygwin.<WBR>com/</A><BR>-- <BR>Mike 

  Maxwell<BR><A 

  href="mailto:maxwell%40ldc.upenn.edu">maxwell@ldc.<WBR>upenn.edu</A><BR></P></DIV><!--End group email --></BLOCKQUOTE>

<span width="1" style="color: white;"/>__._,_.___</span>


<!-- |**|begin egp html banner|**| -->


  <img src="http://geo.yahoo.com/serv?s=97476590/grpId=11682781/grpspId=1604195911/msgId=3416/stime=1159077908" width="1" height="1"> <br>


<!-- |**|end egp html banner|**| -->


<!-- |**|begin egp html banner|**| -->


  <br><br>

  <div style="width:500px; text-align:right; margin-bottom:1px; color:#909090;">

    <tt>SPONSORED LINKS</tt>

  </div>

  <table bgcolor=#e0ecee cellspacing="13" cellpadding="0" width=500px>        

                  <tr valign=top>

            <td style="width:25%;">

        <tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkOWJyMThrBF9TAzk3NDc2NTkwBF9wAzEEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Science+lab+equipment&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=BbJi6CErds7KZnId7Tx8fA">Science lab equipment</a></tt>

      </td>

                      <td style="width:25%;">

        <tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkcjgzaWtpBF9TAzk3NDc2NTkwBF9wAzIEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Life+science+research&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=vVj7M6vkryjVlUH2o8EwOw">Life science research</a></tt>

      </td>

                      <td style="width:25%;">

        <tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkOHM4c2pnBF9TAzk3NDc2NTkwBF9wAzMEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Life+sciences&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=9KX1xOn2FPCeutkaD5vX2Q">Life sciences</a></tt>

      </td>

              </tr>

                        <tr valign=top>

            <td style="width:25%;">

        <tt><a href="http://groups.yahoo.com/gads;_ylc=X3oDMTJkMmdqMjhrBF9TAzk3NDc2NTkwBF9wAzQEZ3JwSWQDMTE2ODI3ODEEZ3Jwc3BJZAMxNjA0MTk1OTExBHNlYwNzbG1vZARzdGltZQMxMTU5MDc3OTA4?t=ms&k=Cognitive+science&w1=Science+lab+equipment&w2=Life+science+research&w3=Life+sciences&w4=Cognitive+science&c=4&s=96&g=0&.sig=nrYCTyAWl2rPQqQCE5yULw">Cognitive science</a></tt>

      </td>

                    </tr>

      </table>     

  
<!-- |**|end egp html banner|**| -->


<!-- |**|begin egp html banner|**| -->


<br>

      <div style="font-family: verdana; font-size: 77%; border-top: 1px solid #666; padding: 5px 0;" >

      Your email settings: Individual Email|Traditional <br>

      <a href="http://groups.yahoo.com/group/lexicographylist/join;_ylc=X3oDMTJnZnQ3YzA5BF9TAzk3MzU5NzE0BGdycElkAzExNjgyNzgxBGdycHNwSWQDMTYwNDE5NTkxMQRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMTU5MDc3OTA4">Change settings via the Web</a> (Yahoo! ID required) <br>

      Change settings via email: <a href="mailto:lexicographylist-digest@yahoogroups.com?subject=Email Delivery: Digest">Switch delivery to Daily Digest</a> | <a href = "mailto:lexicographylist-fullfeatured@yahoogroups.com?subject=Change Delivery Format: Fully Featured">Switch to Fully Featured</a> <br>

           <a href="http://groups.yahoo.com/group/lexicographylist;_ylc=X3oDMTJlOXEwbHRhBF9TAzk3MzU5NzE0BGdycElkAzExNjgyNzgxBGdycHNwSWQDMTYwNDE5NTkxMQRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTE1OTA3NzkwOA--">

        Visit Your Group 

      </a> |

      <a href="http://docs.yahoo.com/info/terms/">

        Yahoo! Groups Terms of Use

      </a> |

      <a href="mailto:lexicographylist-unsubscribe@yahoogroups.com?subject=Unsubscribe">

       Unsubscribe 

      </a> 

 <br>

    </div>

  <br>


<!-- |**|end egp html banner|**| -->


<span  style="color: white;"/>__,_._,___</span>

</BODY></HTML>