[Corpora-List] Corpora and SQL
Patrick Drouin
pat at traductik.org
Tue May 22 19:34:40 UTC 2007
Hello Tony,
Lars Nygaard has some very good suggestions. I'm adding a couple more
ideas. You might want to create a few indexes to speed up the query, it
is sometimes useful to use a tool to see what the database engine is
doing with your query. Lookup the documentation for the word EXPLAIN, it
will show you where the bottleneck is in you SELECT statement. If you're
using mySQL, you can try NAVICAT.
I had a web-based concordancer based on mySQL and I used 2 tables
instead of 1 in order to reduce the size of the tables. I had something
that looked like this (I'm doing this from memory as the code is not
handy at the moment) :
table CORPUS
+-----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| token_ID | int(11) | NO | | NULL | |
+-----------+-------------+------+-----+---------+----------------+
table TOKENS
+----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+----------------+
| token_ID | int(11) | NO | PRI | NULL | auto_increment |
| word | varchar(20) | NO | | NULL | |
| tag | varchar(20) | NO | | NULL | |
| lemma | varchar(20) | NO | | NULL | |
+----------+-------------+------+-----+---------+----------------+
Now, this will reduce the size of your main table but you might get a
hit on the speed because you need to join the tables when you query
(although I doubt it). If your indexes are created correctly, I believe
this might be faster BUT it would have to be tested.
I also remember posts from Mark Davies on a similar subject, he seems to
have found a good solution because his tool is quite quick.
Best regards,
Patrick
More information about the Corpora
mailing list