[Corpora-List] N-grams -> database

Mihail Kopotev mihail.kopotev at helsinki.fi
Fri Oct 11 20:16:56 UTC 2013


Dear all,
I've got a quite simple question, and I hope the answer might be equally 
simple.

We are working with n-grams, which are stored as:
token1, lemma1, tagset1, token2, lemma2, tagset2, [and soon]

I am wondering, if there is a standard way to covertthese n-gramsinto a 
database?
Technically, there is, of course, no problem to covertbut my question is 
which indexes should be built and what should be stored as is without 
any optimization.
And more specifically, does it make any sense to keep the whole tagsets, 
or abetter way is to store each tagseparately?

Thank you!
Mikhail Kopotev

-- 
Mikhail Kopotev, PhD, Adj.Prof.
University Lecturer
Department of Modern Languages
University of Helsinki
http://www.helsinki.fi/~kopotev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131011/864f73cf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list