[Corpora-List] N-grams -> database
Mihail Kopotev
mihail.kopotev at helsinki.fi
Fri Oct 11 20:16:56 UTC 2013
Dear all,
I've got a quite simple question, and I hope the answer might be equally
simple.
We are working with n-grams, which are stored as:
token1, lemma1, tagset1, token2, lemma2, tagset2, [and soon]
I am wondering, if there is a standard way to covertthese n-gramsinto a
database?
Technically, there is, of course, no problem to covertbut my question is
which indexes should be built and what should be stored as is without
any optimization.
And more specifically, does it make any sense to keep the whole tagsets,
or abetter way is to store each tagseparately?
Thank you!
Mikhail Kopotev
--
Mikhail Kopotev, PhD, Adj.Prof.
University Lecturer
Department of Modern Languages
University of Helsinki
http://www.helsinki.fi/~kopotev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20131011/864f73cf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list