[Corpora-List] jebari

Martin Reynaert Reynaert at uvt.nl
Mon Jan 15 12:17:09 UTC 2007


Hi Chaker,

The tool to split a large file into smaller ones in Unix/Linux is 
'csplit'. This can split on the basis of a pattern, e.g. the tag that 
identifies the beginning of a document.

There is also 'split', but this splits on the basis of number of lines, 
bytes, etc.

Yours,

Martin Reynaert
Postdoc
Induction of Linguistic Knowledge
Tilburg University
The Netherlands

Chaker Jabbari wrote:
> Dear all
>  
> I need a tool to segment the webkb-data file in documents (each 
> document in a file). can you help me?
>  
> Thanks 



More information about the Corpora mailing list