<div dir="ltr"><br clear="all"><p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="color:rgb(0,0,0)"><span style="font-size:26px"><b>Call for participation - <span class="gmail-il">FinTOC</span> shared task</b></span></span></p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><font size="1" color="#000000">⇒ <b>The Second Financial Narrative Processing Workshop (FNP 2019)</b></font><br>
<font size="1"><font color="#000000">⇒ </font><font face="Arial" color="#000000"><b>The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19)</b></font></font></p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"> </p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><b><u>Task</u></b>: Predict a Table of Content (ToC) from financial documents.</p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Two sub-tasks are proposed :</p>
<ul><li dir="ltr">
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Detection of titles</p>
</li><li dir="ltr">
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Prediction of a ToC</p>
</li></ul>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Shared task webpage: <a href="https://gmail.us20.list-manage.com/track/click?u=9b9c52fc6d2c60970cdc072fa&id=2e8113bc5b&e=9aba78199b" style="color:rgb(0,173,216);font-weight:normal;text-decoration:underline" target="_blank">http://wp.lancs.ac.uk/cfie/shared-task/</a></p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Shared task contact: <a href="mailto:fin.toc.task@gmail.com" style="color:rgb(0,173,216);font-weight:normal;text-decoration:underline" target="_blank"><font size="2" color="#0000ff">fin.toc.task@gmail.com</font></a></p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="color:rgb(178,34,34)"><b>Important dates</b></span></p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><b>Submission deadline: Aug. 2, 2019</b></p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Workshop day: September 30, 3019</p>
<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">
</p><p style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><font size="6"><b>More reading</b> </font>👇</p>
<p style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="font-size:18px"><span style="color:rgb(128,0,0)"><b>“Financial Document Structure Extraction”</b></span></span></p>
<p style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="color:rgb(255,140,0)"><b>Introduction:</b></span></p><br><p><span style="font-family:helvetica">A vast amount of financial
documents are created and published constantly in machine-readable
formats (generally PDF file format), with only minimal structure
information. Firms use such documents to report their activities,
financial situation or potential investment plans to shareholders,
investors and the financial markets, basically corporate annual reports
containing detailed financial and operational information.</span></p>
<p><span style="font-family:helvetica">In some countries as in the US
or in France, regulators as EDGAR SEC or AMF require firms to follow a
certain template when reporting their financial results to insure
standardisation and consistency across firms’ disclosures. In other
European countries, on the other hand, the management usually have more
discretion on what where and how to report resulting in lack of
standardisation between financial documents published within the same
market.</span></p>
<p><span style="font-family:helvetica">In this shared task, we focus on
analysing Financial Prospectuses; official PDF documents in which
investment funds precisely describe their characteristics and investment
modalities. Although the content they must include is often regulated,
their format is not standardized and displays a great deal of
variability ranging from plain text format, towards more graphical and
tabular presentation of data and information. The majority of
prospectuses are published without a table of content (TOC), which is
usually needed to help readers to navigate within the document by
following a simple outline of headers and page numbers, and assist legal
teams in checking if all the contents required are fully included.
Thus, automatic analyses of prospectuses to extract their structure is
becoming more and more vital to many firms across the world.</span></p>
<hr>
<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Task:</span></span></b></p>
<p><span style="font-family:helvetica">As part of the Financial Narrative Processing Workshop, we present a shared task on Financial Document Structure Extraction. </span></p>
<p><span style="font-family:helvetica">Systems participating in this
shared task will be given a sample collection of financial prospectuses
with different level of structure and different lengths (document
sizes), which are to be automatically analyzed to extract structural
information and build a table of content.</span></p>
<p><span style="font-family:helvetica">The task will contain two sub tasks are: </span></p>
<p><span style="font-family:helvetica">a) Title detection</span></p>
<p><span style="font-family:helvetica">This is a binary classification
task aiming at detecting titles in financial prospectuses. Given a set
of text blocks, the goal is to classify each given text block as a
‘title’ or ‘non-title’. As shown in Figure 1 the titles can have
different layouts (marked with red and green boxes) and they have to be
distinguished from the regular text (‘non-title’ with grey boxes).</span></p>
<p><span style="font-family:helvetica"><a href="http://wp.lancs.ac.uk/cfie/files/2018/10/sharedTask-1.png" target="_blank"><img src="http://wp.lancs.ac.uk/cfie/files/2018/10/sharedTask-1-247x300.png" alt="" class="gmail-CToWUd" width="247" height="300"></a></span></p>
<p><span style="font-family:helvetica"><a href="http://wp.lancs.ac.uk/cfie/files/2018/10/sharedTask-1.png" rel="noopener noreferrer" target="_blank">Click to show full sized image.</a></span></p>
<p><span style="font-family:helvetica">b) TOC structure extraction</span></p>
<p><span style="font-family:helvetica">The TOC is a hierarchical
organisation of the headers of a document. In this subtask, we provide
only the headers of a prospectus, and the goal is to (i) identify the
hierarchical level of the header (ii) organize the headers of the
document according to this hierarchical structure. Note that two
headers, with the same layout and the same text can have different
hierarchical levels depending on their location in the document.</span></p>
<p><span style="font-family:helvetica">Participants need to register.
Once registered, all participating teams will be provided with a common
training dataset, which includes common pre-processed input and
corrected output. A common development set will also be provided. A
blind test data set will be used to evaluate the output of the
participating teams. An evaluation script will be provided to all the
teams. In addition to the PDF version of the documents, we will provide
their XML representation.</span></p>
<hr>
<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Background:</span></span></b></p>
<p><span style="font-family:helvetica">Existing work on book and
document table of contents (TOC) recognition has been almost all on
small size, application-dependent, and domain-specific datasets.
However, TOC of documents from different domains differ significantly in
their visual layout and style, making TOC recognition a challenging
problem for a large scale collection of heterogeneous documents and
books. Compared to regular books (mostly provided in a full text format
with limited structural information such as pages and paragraphs),
Financial documents, containing textual and non textual content, have a
more sophisticated structure including, parts, sections, sub-sections,
sub-sub-sections. </span></p>
<hr>
<p style="text-align:justify"><span style="font-size:14pt"><span style="color:rgb(255,102,0)"><span style="color:rgb(255,102,0)"><span style="font-size:14pt"><b>Data Format and </b></span><span style="font-size:18.6667px"><b>Evaluation</b></span><span style="font-size:14pt"><b>:</b></span></span></span></span></p>
<p>The following pdf file describes the data format and evaluation metric used in the shared task: <a href="https://docs.google.com/document/d/1gYRS1wvNrm5DT68W-Jn7LpgfomrbP_diqQYAHM5tA0A/edit" rel="noopener noreferrer" target="_blank">Data Format Details</a></p>
<p> </p>
<hr>
<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Important Dates:</span></span></b></p><ul><li class="gmail-first-child">Aug 2,2019: continue collecting <b></b></li><li>Aug 5,2019: results publication</li><li class="gmail-last-child">Aug 10,2019: Deadline for papers <br></li><li class="gmail-last-child">Aug 17,2019: reviews and notification of acceptanc<b>e<br></b></li></ul><ul><li><span style="font-family:helvetica">Aug 24, 2019: camera ready version of shared task system papers due</span></li><li class="gmail-m_3238902616319271611gmail-last-child"><span style="font-family:helvetica">Sep 30, 2019: Workshop day</span></li></ul>
<hr>
<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Shared Task Organisers:</span></span></b></p>
<ul><li class="gmail-m_3238902616319271611gmail-first-child"><span style="font-family:helvetica"><a href="https://scholar.google.fr/citations?user=vGXfCl0AAAAJ&hl=fr" rel="noopener noreferrer" target="_blank">Dr Sira Ferradans</a>, Fortia Financial Solutions</span></li><li><span style="font-family:helvetica"><a href="https://www.linkedin.com/in/najah-imane-bentabet-7182b456/" rel="noopener noreferrer" target="_blank">Najah-Imane Bentabet</a>, Fortia Financial Solutions</span></li><li><span style="font-family:helvetica"><a href="http://www.lancaster.ac.uk/staff/elhaj" rel="noopener noreferrer" target="_blank">Dr Mahmoud El-Haj</a>, Lancaster University</span></li><li class="gmail-m_3238902616319271611gmail-last-child"><a href="https://www.linkedin.com/in/jugeremi/" target="_blank">Rémi Juge</a>, <span style="font-family:helvetica"> Fortia Financial Solutions</span></li></ul>
<hr>
<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Shared Task Contact:</span></span></b></p>
<p><span style="font-family:helvetica">Questions about <span class="gmail-il">FinTOC</span>-2019 shared task can be sent to:</span></p>
<p><span style="font-family:helvetica"><a href="mailto:fin.toc.task@gmail.com" target="_blank">fin.toc.task@gmail.com</a></span></p><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><br><table style="font-size:small;font-family:arial,helvetica,sans-serif" width="466" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td style="border-right:2px solid rgb(0,47,138)" width="140"><img src="http://www.fortia.fr/wp-content/uploads/2018/01/favicon.png"></td><td width="326"><table style="margin-left:22px" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td style="line-height:25px"><font color="#002f8a"><span style="font-size:14px"><b>Sira FERRADANS</b></span></font><span style="color:rgb(33,219,174);font-size:12px"> Chief Research Scientist</span></td></tr><tr><td style="font-size:11px;color:rgb(153,153,153);line-height:18px">17, avenue George V. Paris 75008</td></tr><tr><td style="font-size:11px;color:rgb(153,153,153);line-height:20px;padding-top:3px">+33 (0)6 73 77 20 03</td></tr><tr><td style="font-size:11px;color:rgb(153,153,153);line-height:18px"><a style="color:rgb(17,85,204)">sira.ferradans@fortia.fr</a> | <a style="color:rgb(17,85,204)">www.fortia.fr</a></td></tr></tbody></table></td></tr></tbody></table></div></div></div></div></div></div></div></div></div></div></div></div></div></div>