[Corpora-List] portability studies?

Taras Zagibalov T.Zagibalov at sussex.ac.uk
Thu Jul 3 18:52:30 UTC 2008


Thank you for your remark.
I think that the problem of the domain distance is important only for 
less portable systems: if a system is highly portable (ideally it 
doesn't require training corpus, wordnets, big sets of rules etc), it 
shouldn't depend much on the distance between two domains.
I am looking for a general description of the portability and ways to 
measure it for a given system. In other words, I am looking for a means 
of measuring a system's independence from the distance between domains A 
and B in case of porting such a system from domain A to B.

Best regards,
Taras

Adam Kilgarriff ?????:
> Taras,
> the prior question is: "how close is one domain or genre is to 
> another". (Presumably, porting costs vary with distance between 
> domains/genres, so an account of porting costs without an account of 
> text type similarity tells us nothing.)
> But it's not a question that has had much scientific exploration. Only 
> viable approach I know of is based on comparing corpus frequencies of 
> words or other constructs - see eg contrastive accounts of web corpora 
> in recent work by Sharoff, Ferraresi and others
> Adam
> Sharoff, S (2006) Creating general-purpose corpora using automated 
> search engine queries. 
> <http://corpus.leeds.ac.uk/serge/publications/wacky-paper.pdf> In M. 
> Baroni, S. Bernardini (eds.) /WaCky! Working papers on the Web as 
> Corpus/, Bologna, 2006.
> A. Ferraresi, E. Zanchetta, M. Baroni and S. Bernardini. 2008. 
> Introducing and evaluating ukWaC, a very large Web-derived corpus of 
> English 
> <http://clic.cimec.unitn.it/marco/publications/lrec2008/lrec08-ukwac.pdf>. 
> In Proceedings of the WAC4 Workshop at LREC 2008.
>
> 2008/7/3 Taras Zagibalov <T.Zagibalov at sussex.ac.uk 
> <mailto:T.Zagibalov at sussex.ac.uk>>:
>
>     Dear colleagues,
>     I've been trying to study the problem of NLP systems' portability, but
>     failed to find any paper covering the subject. Could you please advise
>     of any source of information that describes the problem of porting an
>     NLP system to different domains/genres/languages and provides some
>     metrics that measures how much it takes (of time, labour,
>     resources...)
>     to port a system.
>     Thank you in advance.
>
>     Taras Zagibalov
>     University of Sussex
>
>     _______________________________________________
>     Corpora mailing list
>     Corpora at uib.no <mailto:Corpora at uib.no>
>     http://mailman.uib.no/listinfo/corpora
>
>
>
>
> -- 
> ================================================
> Adam Kilgarriff http://www.kilgarriff.co.uk
> Lexical Computing Ltd http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd http://www.lexmasterclass.com
> Universities of Leeds and Sussex adam at lexmasterclass.com 
> <mailto:adam at lexmasterclass.com>
> ================================================ 

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list