[Corpora-List] POS tagging via relational databases

Zhang Le ejoy at xinhuanet.com
Fri Sep 26 01:30:51 UTC 2003


Hello Mark Davies,
  I found your mail interesting. But I suspect its performance, both in
terms of speed and accuracy.
First, using DB is just another kind of representation of linear text
token. So the performance will be at most as good as a traditional tagger.
Second, although the DB operation can be somewhat efficient, it is
relatively hard to incorporate other resources and more powerful features
(such as Word N-grams, or a word segmenter in Chinese case) without special
treatment. Do you know any experiments like this being carried so far? I'm
glad to hear about them.
 On Wed, 24 Sep 2003 13:17:55 -0600, Mark Davies <Mark_Davies at byu.edu>
wrote:

> Is anyone aware of projects in which relational databases have been used
> to do POS tagging?  Rather than passing through a linear text token by
> token, it would all be done via adjacent rows in the database, using
> subqueries or JOINs.  For example, you would have a table with N number
> of rows, where N = number of words in the corpus.  Each row would have
> the following structure (lemma would probably be here as well):
> ...

--
Zhang Le
Natural Language Processing Lab
Northeastern University, P.R.China
http://www.nlplab.cn/zhangle/



More information about the Corpora mailing list