[Corpora-List] To segment HTML document?
Delip Rao
deliprao at yahoo.com
Tue Oct 25 16:34:52 UTC 2005
Try j-spider for crawling
http://j-spider.sourceforge.net/
But for HTML segmentation and extraction from HTML
documents you may want to look at the Wrapper work by
Stephen Soderland.
--- Chris Jordan <cjordan at cs.dal.ca> wrote:
> Hey Imen,
>
> Sounds like you are writing a crawler in Java. If so
> why reinvent the
> wheel? There are plenty of open source ones lying
> around.
>
> ismi.touati wrote:
>
> > Dear all,
> >
> > Does anyone know of :
> > - program to segment HTML documents (web
> pages),
> > - command java that can connect to a web page
> on the internet
> > having his URL.
> >
> > Thanks
> >
> > All the best
> >
> > Imen.
> >
> > //****************************//
> > Imen Touati
> > Master Student at Faculty of Economic Science and
> management of sfax,
> > Tunisia.
> > LARIS laboratory
> > Addresse : LARIS, FSEGS, BP 1088, 3018 Sfax,
> Tunisia
> > Tel : (216) 74 27 87 77
> > e-mail : ismi.touati at laposte.net
> <mailto:ismi.touati at laposte.net>
> >
> >
> > /Accédez au courrier électronique de La Poste :
> www.laposte.net ;/
> > /3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50
> (0,34/mn)/
>
>
>
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 1GB free storage!
http://sg.whatsnew.mail.yahoo.com
More information about the Corpora
mailing list