[Corpora-List] structured data (enu | csy) for IE needed

Filip Malik filip.malik at centrum.cz
Thu Jan 25 07:25:13 UTC 2007


Hello all,

for my graduation theses, I need a set of structured data for some experiments:
Data set should consists of XML files, HTML files or any of hypertext based files. 
Next requirement is: "highly structuded data". This means, that I'm not interested
in data with structure such as next example has: 
<p>Paragraph, many words in same tag</p>
I' looking for the data, that are more structured. Like this example:
<t> <tag2>Few words (up to 10)</tag2> <tag3>Few words (up to 10)</tag3> </t>
Last requirement is: English or Czech domain. 

I hope, that somebody, who reads Corpora was using similar data set, which
could be reuse again. My goal is IE from hypertext by using content and structure 
of data.

Thanks and regards, 
Filip Malik

-fm 



More information about the Corpora mailing list