[Corpora-List] Format for context info
Kilian Evang
poststelle at texttheater.net
Thu May 12 11:48:13 UTC 2011
Hi Hans,
On 05/11/2011 10:18 PM, hans christensen wrote:
> I'm kinda new to the "scene" so I'm not really familiar with what
> standards are commonly used (if such exist). So, my question is: I want
> to make context information for the HC Corpora
> <http://corpora.heliohost.org/> available for download (for now I'm
> looking at 2-gram and 3-grams). I was wondering if there are any
> standard way of doing this?
> I was thinking just to give them as tab separated txt files as that
> seems the most universal, e.g. something like:
>
> how[tab]are[tab]54
I think that's a good idea. Google's huge n-gram corpus is also released
in this format (though I'm not sure if they use tabs or spaces):
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
Best,
Kilian
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list