[Corpora-List] chaker jebari

Adam Kilgarriff adam at lexmasterclass.com
Mon Dec 5 10:02:28 UTC 2005


In the general case, this is a very big question.  Once you limit it to
particular types of documents, eg, scientific papers, or journalism, or CVs,
it becomes somewhat tractable, and this is what citeseer and DBLP are doing
on an industrial scale for academic papers.  

 

As a general rule, you depend on the conventions that people use in
structuring each particular document type - the stronger the conventions,
the more tractable it is, and the more different conventions (and markup
languages, etc)  there are, the more work there is to cover them all.

 

Adam

 

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Chaker Jabbari
Sent: 04 December 2005 08:40
To: CORPORA at UIB.NO
Subject: [Corpora-List] chaker jebari

 

Hi

 

I need a tool to identify the logical structure of a textual document. 

for example :

a logical structure of a scientific paper is : title, abstract, key words,
introduction, text, conclusion, references

a logical structure of a call for papers is : title, topics, important
dates, submission, ...

 

I ask you if any one have an idea about a tool or an algorithm to identify
the logical structure.

 

 

regards

chaker jebari 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20051205/2e8dfc7e/attachment.htm>


More information about the Corpora mailing list