<div dir="ltr"><div dir="ltr"><span id="gmail-m_-6425169273732452500gmail-docs-internal-guid-f2f17344-7fff-52c9-58b7-f83df7249aef"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:24pt;font-family:Arial;color:rgb(0,0,0);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">Call for participation - FinTOC shared task </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><font color="#000000" style="" size="1">⇒ The Second Financial Narrative Processing Workshop (FNP 2019)</font></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><font size="1"><!--
--><font color="#000000" style=""><span style="font-family:Arial;background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap">⇒</span><span style="font-family:Arial;background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"> </span></font><font face="Arial" color="#000000" style=""><span style="white-space:pre-wrap"><b style="">The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19)</b></span></font></font></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:13.5pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></p><!--
--><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Task: </span><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Predict a Table of Content (ToC) from financial documents. </span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Two sub-tasks are proposed : </span></p><!--
--><ul style="margin-top:0pt;margin-bottom:0pt"><li dir="ltr" style="list-style-type:disc;font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Detection of titles </span></p></li><li dir="ltr" style="list-style-type:disc;font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><!--
--><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Prediction of a ToC </span></p></li></ul><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:13.5pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><span style="font-weight:normal" id="gmail-m_-6425169273732452500gmail-docs-internal-guid-f2f17344-7fff-52c9-58b7-f83df7249aef"><br></span></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><!--
--><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Shared task webpage:</span><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> </span><span style="text-decoration:underline;font-size:12pt;font-family:Arial;color:rgb(17,85,204);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;vertical-align:baseline;white-space:pre-wrap"><a href="http://wp.lancs.ac.uk/cfie/shared-task/" style="text-decoration:none" target="_blank">http://wp.lancs.ac.uk/cfie/shared-task/</a></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><!--
--><span style="color:rgb(51,51,51);font-family:Arial;font-size:16px;font-weight:700;white-space:pre-wrap">Shared task contact: </span><a href="mailto:fin.toc.task@gmail.com" style="box-sizing:inherit;font-weight:bold;text-decoration-line:none;word-wrap:break-word;word-break:break-word;font-family:helvetica"><font size="2" style="background-color:rgb(255,255,255)" color="#0000ff">fin.toc.task@gmail.com</font></a><br></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(255,0,0);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:underline;vertical-align:baseline;white-space:pre-wrap">Important dates</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><!--
--><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Registration deadline: </span><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">June 29, 2019</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Submission deadline: </span><!--
--><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">July 13, 2019</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Workshop day:</span><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"> September 30, 3019</span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><!--
--><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><br></span></p><p dir="ltr" style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><br></span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-family:Arial;color:rgb(51,51,51);background-color:transparent;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap"><font size="6" style=""><b>More reading</b> </font><span style="font-weight:400;font-size:12pt;margin-right:0.2ex;margin-left:0.2ex">👇</span></span></p><!--
--><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,255)"><strong style="box-sizing:inherit;font-family:Karla,sans-serif;font-size:24px;text-align:justify;white-space:normal"><span style="box-sizing:inherit;color:rgb(153,51,0)">“Financial Document Structure Extraction”</span></strong><br></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px;text-align:justify"><strong style="box-sizing:inherit"><span style="box-sizing:inherit;font-size:14pt"><span style="box-sizing:inherit;color:rgb(255,102,0);background-color:rgb(255,255,255)">Introduction:</span></span></strong></p><!--
--><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">A vast amount of financial documents are created and published constantly in machine-readable formats (generally PDF file format), with only minimal structure information. Firms use such documents to report their activities, financial situation or potential investment plans to shareholders, investors and the financial markets, basically corporate annual reports containing detailed financial and operational information.</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">In some countries as in the US or in France, regulators as EDGAR SEC or AMF require firms to follow a certain <!--
-->template when reporting their financial results to insure standardisation and consistency across firms’ disclosures. In other European countries, on the other hand, the management usually have more discretion on what where and how to report resulting in lack of standardisation between financial documents published within the same market.</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><strong style="box-sizing:inherit;font-family:Karla,sans-serif;font-size:24px;text-align:justify;white-space:normal"><span style="box-sizing:inherit;color:rgb(153,51,0);background-color:rgb(255,255,255)"></span></strong></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><!--
--><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">In this shared task, we focus on analysing Financial Prospectuses; official PDF documents in which investment funds precisely describe their characteristics and investment modalities. Although the content they must include is often regulated, their format is not standardized and displays a great deal of variability ranging from plain text format, towards more graphical and tabular presentation of data and information. The majority of prospectuses are published without a table of content (TOC), which is usually needed to help readers to navigate within the document by following a simple outline of headers and page numbers, and assist professional teams in checking if all the contents required are fully included. Thus, automatic analyses of prospectuses to extract their structure is becoming more and more vital to many firms across the world.<!--
--></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)"><br></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px;text-align:justify"><strong style="box-sizing:inherit"><span style="box-sizing:inherit;font-size:14pt"><span style="box-sizing:inherit;color:rgb(255,102,0);background-color:rgb(255,255,255)">Task:</span></span></strong></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">As part of the Financial Narrative Processing Workshop, we present a shared task on Financial Document Structure Extraction.</span></p><!--
--><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Systems participating in this shared task will be given a sample collection of financial prospectuses with different level of structure and different lengths (document sizes), which are to be automatically analyzed to extract structural information and build a table of content.</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">The task will contain two sub tasks are:</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">a) <!--
-->Title detection</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)"></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">This is a binary classification task aiming at detecting titles in financial prospectuses. Given a set of text blocks, the goal is to classify each given text block as a ‘title’ or ‘non-title’. Titles can have different layouts and they have to be distinguished from the regular text.</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">b) <!--
-->TOC structure extraction</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">The TOC is a hierarchical organisation of the headers of a document. In this subtask, we provide only the headers of a prospectus, and the goal is to (i) identify the hierarchical level of the header (ii) organize the headers of the document according to this hierarchical structure. Note that two headers, with the same layout and the same text can have different hierarchical levels depending on their location in the document.</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Participants need to register. Once registered, all participating <!--
-->teams will be provided with a common training dataset, which includes common pre-processed input and corrected output. A common development set will also be provided. A blind test data set will be used to evaluate the output of the participating teams. An evaluation script will be provided to all the teams. In addition to the PDF version of the documents, we will provide their XML representation.</span></p><hr style="box-sizing:content-box;height:1px;border:0px;margin-bottom:1.6em;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)"></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px;text-align:justify"><strong style="box-sizing:inherit"><!--
--><span style="box-sizing:inherit;font-size:14pt"><span style="box-sizing:inherit;color:rgb(255,102,0);background-color:rgb(255,255,255)">Background:</span></span></strong></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Existing work on book and document table of contents (TOC) recognition has been almost all on small size, application-dependent, and domain-specific datasets. However, TOC of documents from different domains differ significantly in their visual layout and style, making TOC recognition a challenging problem for a large scale collection of heterogeneous documents and books. Compared to regular books (mostly provided in a full text format with limited structural information such as pages and paragraphs), Financial documents, containing textual and non textual <!--
-->content, have a more sophisticated structure including, parts, sections, sub-sections, sub-sub-sections. </span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px;text-align:justify"><strong style="box-sizing:inherit"><span style="box-sizing:inherit;font-size:14pt"><span style="box-sizing:inherit;color:rgb(255,102,0);background-color:rgb(255,255,255)"></span></span></strong></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Furthermore, TOCs provide at a glance, the entire structure and layout of a document, making its recognition an important feature for document structure extraction and understanding. Extracting a TOC is just a primary step to a pipeline of information extraction and document extensive analyses, to<!--
--> monitor investment rules and examine change over time relative to financial results.</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,255)"><br></span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px;text-align:justify"><strong style="box-sizing:inherit"><span style="box-sizing:inherit;font-size:14pt"><span style="box-sizing:inherit;color:rgb(255,102,0);background-color:rgb(255,255,255)">Important Dates:</span></span></strong></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">(<!--
-->suggested plan FNP FinTOC task at NoDaLiDa 2019)</span></p><p style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><span style="font-size:12pt;font-family:Arial;color:rgb(51,51,51);font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,255)"></span></p><ul style="box-sizing:inherit;margin:0px 0px 0.8em 1.6em;padding:0px;list-style-position:initial;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><li class="gmail-first-child" style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">March 25, 2019: First announcement of shared task</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">April 10, 2019: set up of shared task website</span></li><li style="box-sizing:inherit"><!--
--><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">April 15, 2019: registration begins and release of initial training sets and scoring script</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">May 18, 2019: Final training data release</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Jun 29, 2019: registration deadline</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">July 6, 2019: test set available</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">July 13, 2019: systems’ outputs collected</span></li><li style="box-sizing:inherit"><!--
--><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">July 20, 2019: system results due to participants</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">July 27, 2019: shared task system papers due</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Aug 10, 2019: reviews due</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Aug 17, 2019: notification of acceptance</span></li><li style="box-sizing:inherit"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Aug 24, 2019: camera ready version of shared task system papers due</span></li><li class="gmail-last-child" style="box-sizing:inherit"><!--
--><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Sep 30, 2019: Workshop day</span></li></ul><div><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px;text-align:justify"><strong style="box-sizing:inherit"><span style="box-sizing:inherit;font-size:14pt"><span style="box-sizing:inherit;color:rgb(255,102,0);background-color:rgb(255,255,255)">Shared Task Contact:</span></span></strong></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)">Questions about FinTOC-2019 shared task can be sent to:</span></p><p style="box-sizing:inherit;margin:0.8em 0px;color:rgb(51,51,51);font-family:Karla,sans-serif;font-size:18px"><!--
--><span style="box-sizing:inherit;font-family:helvetica;background-color:rgb(255,255,255)"><a href="mailto:fin.toc.task@gmail.com" style="box-sizing:inherit;color:rgb(102,54,204);font-weight:bold;text-decoration-line:none;word-wrap:break-word;word-break:break-word">fin.toc.task@gmail.com</a></span></p></div></span></div></div>