[Corpora-List] To segment HTML document?

Chris Jordan cjordan at cs.dal.ca
Tue Oct 25 11:35:44 UTC 2005


Hey Imen,

Sounds like you are writing a crawler in Java. If so why reinvent the 
wheel? There are plenty of open source ones lying around.

ismi.touati wrote:

> Dear all,
>  
> Does anyone know of :
>    - program to segment HTML documents (web pages),
>    - command java that can connect to a web page on the internet 
> having his URL.
>  
> Thanks
>  
> All the best
>  
> Imen.
>  
> //****************************//
> Imen Touati
> Master Student at Faculty of Economic Science and management of sfax, 
> Tunisia.
> LARIS laboratory
> Addresse : LARIS, FSEGS, BP 1088, 3018 Sfax, Tunisia
> Tel : (216) 74 27 87 77
> e-mail : ismi.touati at laposte.net <mailto:ismi.touati at laposte.net>
>
>
> /Accédez au courrier électronique de La Poste : www.laposte.net ;/
> /3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50 (0,34/mn)/



More information about the Corpora mailing list