<div dir="ltr"><br clear="all"><p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="color:rgb(0,0,0)"><span style="font-size:26px"><b>Call for participation - <span class="gmail-il">FinTOC</span> shared task</b></span></span></p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><font size="1" color="#000000">⇒ <b>The Second Financial Narrative Processing Workshop (FNP 2019)</b></font><br>

<font size="1"><font color="#000000">⇒ </font><font face="Arial" color="#000000"><b>The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19)</b></font></font></p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"> </p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><b><u>Task</u></b>: Predict a Table of Content (ToC) from financial documents.</p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Two sub-tasks are proposed :</p>


<ul><li dir="ltr">

        <p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Detection of titles</p>

        </li><li dir="ltr">

        <p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Prediction of a ToC</p>

        </li></ul>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Shared task webpage: <a href="https://gmail.us20.list-manage.com/track/click?u=9b9c52fc6d2c60970cdc072fa&id=2e8113bc5b&e=9aba78199b" style="color:rgb(0,173,216);font-weight:normal;text-decoration:underline" target="_blank">http://wp.lancs.ac.uk/cfie/shared-task/</a></p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Shared task contact: <a href="mailto:fin.toc.task@gmail.com" style="color:rgb(0,173,216);font-weight:normal;text-decoration:underline" target="_blank"><font size="2" color="#0000ff">fin.toc.task@gmail.com</font></a></p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="color:rgb(178,34,34)"><b>Important dates</b></span></p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><b>Submission deadline: Aug. 2, 2019</b></p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">Workshop day: September 30, 3019</p>


<p dir="ltr" style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left">  


</p><p style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><font size="6"><b>More reading</b> </font>👇</p>


<p style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="font-size:18px"><span style="color:rgb(128,0,0)"><b>“Financial Document Structure Extraction”</b></span></span></p>


<p style="margin:10px 0px;padding:0px;color:rgb(128,128,128);font-family:Helvetica;font-size:16px;line-height:150%;text-align:left"><span style="color:rgb(255,140,0)"><b>Introduction:</b></span></p><br><p><span style="font-family:helvetica">A vast amount of financial 

documents are created and published constantly in machine-readable 

formats (generally PDF file format), with only minimal structure 

information. Firms use such documents to report their activities, 

financial situation or potential investment plans to shareholders, 

investors and the financial markets, basically corporate annual reports 

containing detailed financial and operational information.</span></p>

<p><span style="font-family:helvetica">In some countries as in the US 

or in France, regulators as EDGAR SEC or AMF require firms to follow a 

certain template when reporting their financial results to insure 

standardisation and consistency across firms’ disclosures. In other 

European countries, on the other hand, the management usually have more 

discretion on what where and how to report resulting in lack of 

standardisation between financial documents published within the same 

market.</span></p>

<p><span style="font-family:helvetica">In this shared task, we focus on

 analysing Financial Prospectuses; official PDF documents in which 

investment funds precisely describe their characteristics and investment

 modalities. Although the content they must include is often regulated, 

their format is not standardized and displays a great deal of 

variability ranging from plain text format, towards more graphical and 

tabular presentation of data and information. The majority of 

prospectuses are published without a table of content (TOC), which is 

usually needed to help readers to navigate within the document by 

following a simple outline of headers and page numbers, and assist legal

 teams in checking if all the contents required are fully included. 

Thus, automatic analyses of prospectuses to extract their structure is 

becoming more and more vital to many firms across the world.</span></p>

<hr>

<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Task:</span></span></b></p>

<p><span style="font-family:helvetica">As part of the Financial Narrative Processing Workshop, we present a shared task on Financial Document Structure Extraction. </span></p>

<p><span style="font-family:helvetica">Systems participating in this 

shared task will be given a sample collection of financial prospectuses 

with different level of structure and different lengths (document 

sizes), which are to be automatically analyzed to extract structural 

information and build a table of content.</span></p>

<p><span style="font-family:helvetica">The task will contain two sub tasks are: </span></p>

<p><span style="font-family:helvetica">a) Title detection</span></p>

<p><span style="font-family:helvetica">This is a binary classification 

task aiming at detecting titles in financial prospectuses. Given a set 

of text blocks, the goal is to classify each given text block as a 

‘title’ or ‘non-title’. As shown in Figure 1 the titles can have 

different layouts (marked with red and green boxes) and they have to be 

distinguished from the regular text (‘non-title’ with grey boxes).</span></p>

<p><span style="font-family:helvetica"><a href="http://wp.lancs.ac.uk/cfie/files/2018/10/sharedTask-1.png" target="_blank"><img src="http://wp.lancs.ac.uk/cfie/files/2018/10/sharedTask-1-247x300.png" alt="" class="gmail-CToWUd" width="247" height="300"></a></span></p>

<p><span style="font-family:helvetica"><a href="http://wp.lancs.ac.uk/cfie/files/2018/10/sharedTask-1.png" rel="noopener noreferrer" target="_blank">Click to show full sized image.</a></span></p>

<p><span style="font-family:helvetica">b) TOC structure extraction</span></p>

<p><span style="font-family:helvetica">The TOC is a hierarchical 

organisation of the headers of a document. In this subtask, we provide 

only the headers of a prospectus, and the goal is to (i) identify the 

hierarchical level of the header (ii) organize the headers of the 

document according to this hierarchical structure. Note that two 

headers, with the same layout and the same text can have different 

hierarchical levels depending on their location in the document.</span></p>

<p><span style="font-family:helvetica">Participants need to register. 

Once registered, all participating teams will be provided with a common 

training dataset, which includes common pre-processed input and 

corrected output. A common development set will also be provided. A 

blind test data set will be used to evaluate the output of the 

participating teams. An evaluation script will be provided to all the 

teams. In addition to the PDF version of the documents, we will provide 

their XML representation.</span></p>

<hr>

<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Background:</span></span></b></p>

<p><span style="font-family:helvetica">Existing work on book and 

document table of contents (TOC) recognition has been almost all on 

small size, application-dependent, and domain-specific datasets. 

However, TOC of documents from different domains differ significantly in

 their visual layout and style, making TOC recognition a challenging 

problem for a large scale collection of heterogeneous documents and 

books. Compared to regular books (mostly provided in a full text format 

with limited structural information such as pages and paragraphs), 

Financial documents, containing textual and non textual content, have a 

more sophisticated structure including, parts, sections, sub-sections, 

sub-sub-sections. </span></p>

<hr>

<p style="text-align:justify"><span style="font-size:14pt"><span style="color:rgb(255,102,0)"><span style="color:rgb(255,102,0)"><span style="font-size:14pt"><b>Data Format and </b></span><span style="font-size:18.6667px"><b>Evaluation</b></span><span style="font-size:14pt"><b>:</b></span></span></span></span></p>

<p>The following pdf file describes the data format and evaluation metric used in the shared task: <a href="https://docs.google.com/document/d/1gYRS1wvNrm5DT68W-Jn7LpgfomrbP_diqQYAHM5tA0A/edit" rel="noopener noreferrer" target="_blank">Data Format Details</a></p>

<p> </p>

<hr>

<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Important Dates:</span></span></b></p><ul><li class="gmail-first-child">Aug 2,2019: continue collecting <b></b></li><li>Aug 5,2019: results publication</li><li class="gmail-last-child">Aug 10,2019: Deadline for papers <br></li><li class="gmail-last-child">Aug 17,2019: reviews and notification of acceptanc<b>e<br></b></li></ul><ul><li><span style="font-family:helvetica">Aug 24, 2019: camera ready version of shared task system papers due</span></li><li class="gmail-m_3238902616319271611gmail-last-child"><span style="font-family:helvetica">Sep 30, 2019: Workshop day</span></li></ul>

<hr>

<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Shared Task Organisers:</span></span></b></p>

<ul><li class="gmail-m_3238902616319271611gmail-first-child"><span style="font-family:helvetica"><a href="https://scholar.google.fr/citations?user=vGXfCl0AAAAJ&hl=fr" rel="noopener noreferrer" target="_blank">Dr Sira Ferradans</a>, Fortia Financial Solutions</span></li><li><span style="font-family:helvetica"><a href="https://www.linkedin.com/in/najah-imane-bentabet-7182b456/" rel="noopener noreferrer" target="_blank">Najah-Imane Bentabet</a>, Fortia Financial Solutions</span></li><li><span style="font-family:helvetica"><a href="http://www.lancaster.ac.uk/staff/elhaj" rel="noopener noreferrer" target="_blank">Dr Mahmoud El-Haj</a>, Lancaster University</span></li><li class="gmail-m_3238902616319271611gmail-last-child"><a href="https://www.linkedin.com/in/jugeremi/" target="_blank">Rémi Juge</a>, <span style="font-family:helvetica"> Fortia Financial Solutions</span></li></ul>

<hr>

<p style="text-align:justify"><b><span style="font-size:14pt"><span style="color:rgb(255,102,0)">Shared Task Contact:</span></span></b></p>

<p><span style="font-family:helvetica">Questions about <span class="gmail-il">FinTOC</span>-2019 shared task can be sent to:</span></p>

<p><span style="font-family:helvetica"><a href="mailto:fin.toc.task@gmail.com" target="_blank">fin.toc.task@gmail.com</a></span></p><br>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><br><table style="font-size:small;font-family:arial,helvetica,sans-serif" width="466" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td style="border-right:2px solid rgb(0,47,138)" width="140"><img src="http://www.fortia.fr/wp-content/uploads/2018/01/favicon.png"></td><td width="326"><table style="margin-left:22px" cellspacing="0" cellpadding="0" border="0"><tbody><tr><td style="line-height:25px"><font color="#002f8a"><span style="font-size:14px"><b>Sira FERRADANS</b></span></font><span style="color:rgb(33,219,174);font-size:12px"> Chief Research Scientist</span></td></tr><tr><td style="font-size:11px;color:rgb(153,153,153);line-height:18px">17, avenue George V. Paris 75008</td></tr><tr><td style="font-size:11px;color:rgb(153,153,153);line-height:20px;padding-top:3px">+33 (0)6 73 77 20 03</td></tr><tr><td style="font-size:11px;color:rgb(153,153,153);line-height:18px"><a style="color:rgb(17,85,204)">sira.ferradans@fortia.fr</a> | <a style="color:rgb(17,85,204)">www.fortia.fr</a></td></tr></tbody></table></td></tr></tbody></table></div></div></div></div></div></div></div></div></div></div></div></div></div></div>