<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
pre
{mso-style-priority:99;
mso-style-link:"HTML Preformatted Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.0pt;
font-family:"Courier New";}
span.apple-style-span
{mso-style-name:apple-style-span;}
span.HTMLPreformattedChar
{mso-style-name:"HTML Preformatted Char";
mso-style-priority:99;
mso-style-link:"HTML Preformatted";
font-family:Consolas;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><a name="_MailEndCompose"><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Hello Andrea<o:p></o:p></span></a></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> I’m working on something similar: entity and relation extraction from semi-structured lists, in particular printed lists (i.e. the texts come from scanned and OCRed document images). I’m not aware of many such datasets, so I will be interested in seeing others’ responses to your question. <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='text-indent:.5in'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Can you give more details about what you are interested in? Are you interested in HTML tables, text tables with tab or some other character delimiters between columns? Printed tables with spatial layout information? Lists of records that do not necessarily have delimiters between columns or column headers?<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='text-indent:.5in'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>I am preparing to create a dataset of different kinds of printed lists in the family history domain, including some tables. I may need to also correct the OCR errors and delimit columns in the corresponding text along with annotating the fields, so that might be close to what you are looking for even if you are not targeting printed tables. <o:p></o:p></span></p><p class=MsoNormal style='text-indent:.5in'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='text-indent:.5in'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>One dataset I’ve been trying out in the mean time is the Cora research paper citations dataset for IE, but this may not fall under your definition of “table” because the fields are not in a consistent order, the list entries do not have a single consistent schema, and the fields are not unambiguously delimited. <o:p></o:p></span></p><p class=MsoNormal style='text-indent:.5in'><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal style='text-indent:.5in'><a href="http://www.cs.umass.edu/~mccallum/data.html"><span style='font-size:11.0pt;font-family:"Calibri","sans-serif"'>http://www.cs.umass.edu/~mccallum/data.html</span></a><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> <o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> Good luck.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Thomas L. Packer<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>BYU CS<o:p></o:p></span></b></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>~~~~~~~~~~~~~~~~~~~~<o:p></o:p></span></p></div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> corpora-bounces@uib.no [mailto:corpora-bounces@uib.no] <b>On Behalf Of </b>Andrea Varga<br><b>Sent:</b> Friday, March 04, 2011 4:15 AM<br><b>To:</b> corpora@uib.no<br><b>Cc:</b> andrea.job06@yahoo.com<br><b>Subject:</b> [Corpora-List] gold standard for IE from tables<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal><span class=apple-style-span><span style='color:black'>Dear corpora members,</span></span><span style='font-family:"Arial","sans-serif";color:black'><o:p></o:p></span></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto'><span lang=EN-GB style='color:black'> </span><span style='font-family:"Arial","sans-serif";color:black'><o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto'><span lang=EN-GB style='font-family:"Arial","sans-serif";color:black'>I was wondering whether there are any publicly available corpora annotated for Information Extraction from tables. I am particularly interested in entity extraction and relation extraction from tables.</span><span style='font-family:"Arial","sans-serif";color:black'><o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto'><span lang=EN-GB style='color:black'> </span><span style='font-family:"Arial","sans-serif";color:black'><o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto'><span lang=EN-GB style='color:black'>Many thanks,</span><span style='font-family:"Arial","sans-serif";color:black'><o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto'><span lang=EN-GB style='color:black'>Andrea</span><span style='font-family:"Arial","sans-serif";color:black'><o:p></o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span class=apple-style-span><span style='font-family:"Courier New";color:black'>-- </span></span><span style='color:black'><o:p></o:p></span></p><pre><span style='color:black'>Ms Andrea Varga MSc<br>PhD Student<br>OAK Group<br>The University of Sheffield <br>a.varga@dcs.shef.ac.uk<o:p></o:p></span></pre><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='color:black'><o:p> </o:p></span></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='color:black'><o:p> </o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p></div></body></html>