<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:p="urn:schemas-microsoft-com:office:powerpoint" xmlns:a="urn:schemas-microsoft-com:office:access" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" xmlns:b="urn:schemas-microsoft-com:office:publisher" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet" xmlns:odc="urn:schemas-microsoft-com:office:odc" xmlns:oa="urn:schemas-microsoft-com:office:activation" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:q="http://schemas.xmlsoap.org/soap/envelope/" xmlns:rtc="http://microsoft.com/officenet/conferencing" xmlns:D="DAV:" xmlns:Repl="http://schemas.microsoft.com/repl/" xmlns:mt="http://schemas.microsoft.com/sharepoint/soap/meetings/" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" xmlns:ppda="http://www.passport.com/NameSpace.xsd" xmlns:ois="http://schemas.microsoft.com/sharepoint/soap/ois/" xmlns:dir="http://schemas.microsoft.com/sharepoint/soap/directory/" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:dsp="http://schemas.microsoft.com/sharepoint/dsp" xmlns:udc="http://schemas.microsoft.com/data/udc" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sub="http://schemas.microsoft.com/sharepoint/soap/2002/1/alerts/" xmlns:ec="http://www.w3.org/2001/04/xmlenc#" xmlns:sp="http://schemas.microsoft.com/sharepoint/" xmlns:sps="http://schemas.microsoft.com/sharepoint/soap/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:udcs="http://schemas.microsoft.com/data/udc/soap" xmlns:udcxf="http://schemas.microsoft.com/data/udc/xmlfile" xmlns:udcp2p="http://schemas.microsoft.com/data/udc/parttopart" xmlns:wf="http://schemas.microsoft.com/sharepoint/soap/workflow/" xmlns:dsss="http://schemas.microsoft.com/office/2006/digsig-setup" xmlns:dssi="http://schemas.microsoft.com/office/2006/digsig" xmlns:mdssi="http://schemas.openxmlformats.org/package/2006/digital-signature" xmlns:mver="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns:mrels="http://schemas.openxmlformats.org/package/2006/relationships" xmlns:spwp="http://microsoft.com/sharepoint/webpartpages" xmlns:ex12t="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:ex12m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:pptsl="http://schemas.microsoft.com/sharepoint/soap/SlideLibrary/" xmlns:spsl="http://microsoft.com/webservices/SharePointPortalServer/PublishedLinksService" xmlns:Z="urn:schemas-microsoft-com:" xmlns:st="" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
span.EmailStyle19
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:98182512;
mso-list-type:hybrid;
mso-list-template-ids:1420312352 -66414614 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;
font-family:Symbol;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level4
{mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level7
{mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=WordSection1>
<p class=MsoPlainText>Corpora list<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> Ancestry.com, Inc. and the
Department of Computer Science at Brigham Young University are working together
to create a publically available, hand-annotated collection of images and OCR transcriptions
of scanned documents. We would like your feedback before we get too far
along. We hope to have this complete early next year.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText style='text-indent:.5in'>This corpus is intended to
facilitate evaluating document image analysis, OCR error correction and/or
information extraction algorithms and related research. The images come
from the scanned books and newspapers in a large collection at Ancestry.com and
include the following kinds of documents:<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>newspapers</b> (typical newspapers from the
20<sup>th</sup> century)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>city directories</b> (like old phone books)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>collage yearbooks</b> (includes photos, names
and majors)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>navy cruise books</b> (like a yearbook for
those who served on large US Navy ships)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>birth records books</b> (recording birth
events and family relationships among parents and children)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>local histories</b> (histories of small geographical
areas)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>family histories</b> (multi-generational
histories of particular families)<o:p></o:p></p>
<p class=MsoPlainText style='margin-left:.75in;text-indent:-.25in;mso-list:
l0 level1 lfo2'><![if !supportLists]><span style='font-family:Symbol'><span
style='mso-list:Ignore'>·<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]><b>church yearbooks</b> (describes the
organization and events of a local church congregation)<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> A few example images can
be found here:<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> <a
href="http://deg.byu.edu/ancestry-examples/">http://deg.byu.edu/ancestry-examples/</a>
<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> The particular selection
of documents was motivated by genealogy and family history research, but we
believe the final corpus with annotations (including manual transcriptions) will
be of value outside this field of research.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText style='text-indent:.5in'>We would like your help in refining
our priorities for this corpus if you are likely to use this corpus in your
research. We ask you to reply to this email with ideas you may have as
well as helping us prioritize an existing wish-list of potential features by
voting at this website:<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText style='text-indent:.5in'><a
href="http://www.allourideas.org/ancestry_corpus">http://www.allourideas.org/ancestry_corpus</a>
<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText style='text-indent:.5in'>This web page will iteratively
present you with random pairs of features. Each feature is described
briefly by a short sentence. For each pair of features, please click on the
one that would be the most useful to you. After each click, you will be
presented with the next pair. Feel free to vote in multiple sessions, on
multiple days. You can continue voting for as long as you would like.
The longer you vote, the better our ranking will be.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> You can also add your own
ideas to this wish-list. Please add them early so other people have a
chance to vote for them, but only after you familiarize yourself with the
features already listed. A complete list can be found by clicking on “View
Results”. We would also like to hear any feedback you may have when
you reply to this email.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> If you know of someone who
might be interested in this corpus, please forward this email to them.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoPlainText> Thank you for your time.<o:p></o:p></p>
<p class=MsoPlainText><o:p> </o:p></p>
<p class=MsoNormal><b>Thomas L. Packer<o:p></o:p></b></p>
<p class=MsoNormal><span style='font-size:8.0pt'>Department of Computer Science<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:8.0pt'>Brigham Young University<o:p></o:p></span></p>
<p class=MsoNormal><o:p> </o:p></p>
</div>
</body>
</html>