[Corpora-List] Format for context info

hans christensen hc.corpus at gmail.com
Wed May 11 20:18:08 UTC 2011


Hi,
I'm kinda new to the "scene" so I'm not really familiar with what standards
are commonly used (if such exist). So, my question is: I want to make
context information for the HC Corpora
<http://corpora.heliohost.org/>available for download (for now I'm
looking at 2-gram and 3-grams). I was
wondering if there are any standard way of doing this?
I was thinking just to give them as tab separated txt files as that seems
the most universal, e.g. something like:

how[tab]are[tab]54

That way it would be easy to load into your own database of choice.
An additional advantage of the txt file is also that it can be highly
compressed (I've got limited space on the server).

Thanks,
Hans Christensen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110511/4a74743a/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list