[Corpora-List] structured data (enu | csy) for IE needed
Filip Malik
filip.malik at centrum.cz
Thu Jan 25 07:25:13 UTC 2007
Hello all,
for my graduation theses, I need a set of structured data for some experiments:
Data set should consists of XML files, HTML files or any of hypertext based files.
Next requirement is: "highly structuded data". This means, that I'm not interested
in data with structure such as next example has:
<p>Paragraph, many words in same tag</p>
I' looking for the data, that are more structured. Like this example:
<t> <tag2>Few words (up to 10)</tag2> <tag3>Few words (up to 10)</tag3> </t>
Last requirement is: English or Czech domain.
I hope, that somebody, who reads Corpora was using similar data set, which
could be reuse again. My goal is IE from hypertext by using content and structure
of data.
Thanks and regards,
Filip Malik
-fm
More information about the Corpora
mailing list