[Corpora-List] portability (summary)
Taras Zagibalov
T.Zagibalov at sussex.ac.uk
Tue Jul 8 17:45:18 UTC 2008
A several days ago I posted a question regrading the issue of NLP
system's portability. I received some good replies and very thankful to
all who helped me in this small investigation.
Georg Rehm asked me to summarise all replies I received (it's what I am
doing by this message) and I also hope that this short summary might be
interesting for some people as well.
First of all I would like to say, that my general feeling is that the
problem of portability has not been studied much - I failed to find any
paper dedicated to this problem (not just practical solutions).
Some correspondents mentioned domain adaptation as a relevant field. For
example, Barbara Plank advised papers written by John Blitzer, Hal Daume
III, Shai Ben-David, Jing Jiang, David McClosky. Adam Kilgarriff also
stressed the importance of the distance between domains for portability
studies.
(I think that domain adaptation is an alternative to creating portable
systems which do not depend on the data thus do not require any kind of
adaptation of training data to the test data)
Anne DeRoeck shared her ideas on this problem suggesting that
portability as an inherent design issue: if a system built to use
domain-dependent features, it will have a very low portability. Same
happens to the systems that are designed to use surface features (e.g.
medical terminology). She also observes that "There is a threeways
relationship between (a) the system and what it can do; (b) the task and
(c) the data it works on; and portability".
I appreciate all contributions and will be glad if somebody shares
his/her ideas on this matter.
Taras Zagibalov
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list