Corpora: Using a relational database to store conc pointers
Mickel Grönroos
mcgronro at ling.helsinki.fi
Thu Mar 30 07:37:39 UTC 2000
Dear colleagues,
Does anybody have any experience of using a relational database to store
index information for a concordance service?
I'm building a test interface for the Bank of Finnish and plan to store
pointers to specific locations in the corpus in a database column, e.g.
something like 344:2555 would point to corpus file number 344, byte
position 2555.
The obvious problem is how one should handle common words, as every
occurence of a specific type needs a pointer of its own. So, if the
frequency of some common word is, say 50,000 this would generate 50,000
pointers as well. Putting these in one field in a column seems to be
rather foolish. Does anybody know how to avoid this?
All comments are welcome.
Thanks,
Mickel Grönroos
Helsinki
www.ling.helsinki.fi/~mcgronro/ | Mickel.Gronroos at helsinki.fi
---------------------------------|----------------------------
Inst. för allmän språkvetenskap | Dep. of General Linguistics
PB 4 (Fabiansgatan 28) | tfn/phone +358-9-191 22707
FI-00014 Helsingfors universitet | fax +358-9-191 23598
More information about the Corpora
mailing list