[Corpora-List] jebari
Martin Reynaert
Reynaert at uvt.nl
Mon Jan 15 12:17:09 UTC 2007
Hi Chaker,
The tool to split a large file into smaller ones in Unix/Linux is
'csplit'. This can split on the basis of a pattern, e.g. the tag that
identifies the beginning of a document.
There is also 'split', but this splits on the basis of number of lines,
bytes, etc.
Yours,
Martin Reynaert
Postdoc
Induction of Linguistic Knowledge
Tilburg University
The Netherlands
Chaker Jabbari wrote:
> Dear all
>
> I need a tool to segment the webkb-data file in documents (each
> document in a file). can you help me?
>
> Thanks
More information about the Corpora
mailing list