Duployan; "OCR misfiling"; Soundex and babelfish(es)
Jeffrey Kopp
jeffreykopp at ATT.NET
Sun Feb 13 02:54:20 UTC 2005
Whoa, Duployan is still alive. All the mentions I'd found in English online
referred to it in the far past tense, implying it fell out of common use
around WWII. (It apparently continued to see some use in Quebec until the
1950s.)
Armed with the phrase "sténographie Duployé" (from your message) and
Google, I discovered a tutorial was published in 1990:
<http://www.stenographie.ch/enseign.html>. (The introductory section is
available in .pdf at
<http://www.stenographie.ch/stenographie_integrale.pdf>). There's been at
least one other; saw a title on sale with a pub. date of 1998.
(How or whether the more frequently seen "Sténographie Usuelle" differs
from Duployan, I have no clue.)
As Continental and Canadian French have morphed in different directions,
some historical adjustment may be required in applying these references,
but I'm sure the KW folks are well familiar with that issue. (I'd guess
LJ's own French, because of his travels and the era, was probably a mix of
the two.)
>How did the National Anthropological Archive (Washington, DC) manage to
>file all of its material relating to Father Le Jeune under "Le Jeun"?!
I'd presume an OCR+spellcheck error that snowballed.This technology is
heavily relied upon today in archive compilation, and as a former word
processor who dealt regularly with the early, less adept software, I'm
quite familiar with (and wary of) the types of mistakes this combo can produce.
(As soon as it became possible to customize a spellchecker's lexicon by
removing specific words, we quickly pulled "sing" and "singed," as only the
alert proofer would spot those common manual typos which would slip right
through a spellcheck. People seldom sang nor singed in legal matters, but
of course signed things all the time.)
As I don't speak French, I'm making some assumptions here from what I can
see on-line: "Le Jeune" (lit. "the young person") is far more common as a
name (including place names in the U.S., often as "Lejeune"), but "le jeun"
(adj. "the younger"?) might pass a French-enabled spellchecker to be missed
by the operator.
I'd previously noticed the same mistake (in spots) in Canadiana.org's
OCR-driven filing system. As they're scanning scratchy microfilm, and the
earlier material was printed in archaic, often odd "cold type," they do
offer a caveat about likely errors.
(Google power tip: Instead of searching for ("Le Jeune" OR LeJeune),
entering "Le-Jeune" will catch both forms, as their engine handles a hyphen
between characters as a hyphen, a space, or neither.)
My then-brother-in-law, researching Renaissance Catalonia, asked me if
there was a way to (in effect) perform a "fuzzy search" on names, due to
the many spelling variations he encountered. Nope--not for his 64KB CP/M
Osborne in 1982. Though we surmised something like it existed, we were
unaware of the "Soundex" system developed by the Census a century prior,
which was applied by the 1990s to mainframe databases. I've used it from
terminals connected to AS/400s, and it's now available in most database
software designed for the beefier PCs of today. The programmers among us
who dream of a "Jargon babelfish" will likely require something of the sort
to make it truly functional. (The algorithm is public domain, for the
hardiest of the Java warriors out there.)
J.
To respond to the CHINOOK list, click 'REPLY ALL'. To respond privately to the sender of a message, click 'REPLY'. Hayu masi!
More information about the Chinook
mailing list