[Corpora-List] portability (summary)

Tue Jul 8 17:45:18 UTC 2008

A several days ago I posted a question regrading the issue of NLP 
system's portability. I received some good replies and very thankful to 
all who helped me in this small investigation.
Georg Rehm asked me to summarise all replies I received (it's what I am 
doing by this message) and I also hope that this short summary might be 
interesting for some people as well.

First of all I would like to say, that my general feeling is that the 
problem of portability has not been studied much - I failed to find any 
paper dedicated to this problem (not just practical solutions).
Some correspondents mentioned domain adaptation as a relevant field. For 
example, Barbara Plank advised papers written by John Blitzer, Hal Daume 
III, Shai Ben-David, Jing Jiang, David McClosky. Adam Kilgarriff also 
stressed the importance of the distance between domains for portability 
studies.
(I think that domain adaptation is an alternative to creating portable 
systems which do not depend on the data thus do not require any kind of 
adaptation of training data to the test data)
Anne DeRoeck shared her ideas on this problem suggesting that 
portability as an inherent design issue: if a system built to use 
domain-dependent features, it will have a very low portability. Same 
happens to the systems that are designed to use surface features (e.g. 
medical terminology). She also observes that "There is a threeways 
relationship between (a) the system and what it can do; (b) the task and 
(c) the data it works on; and portability".

I appreciate all contributions and will be glad if somebody shares 
his/her ideas on this matter.

Taras Zagibalov

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora