Duployan; "OCR misfiling"; Soundex and babelfish(es)

Jeffrey Kopp jeffreykopp at ATT.NET
Sun Feb 13 02:54:20 UTC 2005

Whoa, Duployan is still alive. All the mentions I'd found in English online 
referred to it in the far past tense, implying it fell out of common use 
around WWII. (It apparently continued to see some use in Quebec until the 

Armed with the phrase "sténographie Duployé" (from your message) and 
Google, I discovered a tutorial was published in 1990: 
<http://www.stenographie.ch/enseign.html>. (The introductory section is 
available in .pdf at 
<http://www.stenographie.ch/stenographie_integrale.pdf>). There's been at 
least one other; saw a title on sale with a pub. date of 1998.

(How or whether the more frequently seen "Sténographie Usuelle" differs 
from Duployan, I have no clue.)

As Continental and Canadian French have morphed in different directions, 
some historical adjustment may be required in applying these references, 
but I'm sure the KW folks are well familiar with that issue. (I'd guess 
LJ's own French, because of his travels and the era, was probably a mix of 
the two.)

>How did the National Anthropological Archive (Washington, DC) manage to 
>file all of its material relating to Father Le Jeune under "Le Jeun"?!

I'd presume an OCR+spellcheck error that snowballed.This technology is 
heavily relied upon today in archive compilation, and as a former word 
processor who dealt regularly with the early, less adept software, I'm 
quite familiar with (and wary of) the types of mistakes this combo can produce.

(As soon as it became possible to customize a spellchecker's lexicon by 
removing specific words, we quickly pulled "sing" and "singed," as only the 
alert proofer would spot those common manual typos which would slip right 
through a spellcheck. People seldom sang nor singed in legal matters, but 
of course signed things all the time.)

As I don't speak French, I'm making some assumptions here from what I can 
see on-line: "Le Jeune" (lit. "the young person") is far more common as a 
name (including place names in the U.S., often as "Lejeune"), but "le jeun" 
(adj. "the younger"?) might pass a French-enabled spellchecker to be missed 
by the operator.

I'd previously noticed the same mistake (in spots) in Canadiana.org's 
OCR-driven filing system. As they're scanning scratchy microfilm, and the 
earlier material was printed in archaic, often odd "cold type," they do 
offer a caveat about likely errors.

(Google power tip: Instead of searching for ("Le Jeune" OR LeJeune), 
entering "Le-Jeune" will catch both forms, as their engine handles a hyphen 
between characters as a hyphen, a space, or neither.)

My then-brother-in-law, researching Renaissance Catalonia, asked me if 
there was a way to (in effect) perform a "fuzzy search" on names, due to 
the many spelling variations he encountered. Nope--not for his 64KB CP/M 
Osborne in 1982. Though we surmised something like it existed, we were 
unaware of the "Soundex" system developed by the Census a century prior, 
which was applied by the 1990s to mainframe databases. I've used it from 
terminals connected to AS/400s, and it's now available in most database 
software designed for the beefier PCs of today. The programmers among us 
who dream of a "Jargon babelfish" will likely require something of the sort 
to make it truly functional. (The algorithm is public domain, for the 
hardiest of the Java warriors out there.)


To respond to the CHINOOK list, click 'REPLY ALL'.  To respond privately to the sender of a message, click 'REPLY'.  Hayu masi!

More information about the Chinook mailing list