[Corpora-List] portability studies?
Taras Zagibalov
T.Zagibalov at sussex.ac.uk
Thu Jul 3 18:52:30 UTC 2008
Thank you for your remark.
I think that the problem of the domain distance is important only for
less portable systems: if a system is highly portable (ideally it
doesn't require training corpus, wordnets, big sets of rules etc), it
shouldn't depend much on the distance between two domains.
I am looking for a general description of the portability and ways to
measure it for a given system. In other words, I am looking for a means
of measuring a system's independence from the distance between domains A
and B in case of porting such a system from domain A to B.
Best regards,
Taras
Adam Kilgarriff ?????:
> Taras,
> the prior question is: "how close is one domain or genre is to
> another". (Presumably, porting costs vary with distance between
> domains/genres, so an account of porting costs without an account of
> text type similarity tells us nothing.)
> But it's not a question that has had much scientific exploration. Only
> viable approach I know of is based on comparing corpus frequencies of
> words or other constructs - see eg contrastive accounts of web corpora
> in recent work by Sharoff, Ferraresi and others
> Adam
> Sharoff, S (2006) Creating general-purpose corpora using automated
> search engine queries.
> <http://corpus.leeds.ac.uk/serge/publications/wacky-paper.pdf> In M.
> Baroni, S. Bernardini (eds.) /WaCky! Working papers on the Web as
> Corpus/, Bologna, 2006.
> A. Ferraresi, E. Zanchetta, M. Baroni and S. Bernardini. 2008.
> Introducing and evaluating ukWaC, a very large Web-derived corpus of
> English
> <http://clic.cimec.unitn.it/marco/publications/lrec2008/lrec08-ukwac.pdf>.
> In Proceedings of the WAC4 Workshop at LREC 2008.
>
> 2008/7/3 Taras Zagibalov <T.Zagibalov at sussex.ac.uk
> <mailto:T.Zagibalov at sussex.ac.uk>>:
>
> Dear colleagues,
> I've been trying to study the problem of NLP systems' portability, but
> failed to find any paper covering the subject. Could you please advise
> of any source of information that describes the problem of porting an
> NLP system to different domains/genres/languages and provides some
> metrics that measures how much it takes (of time, labour,
> resources...)
> to port a system.
> Thank you in advance.
>
> Taras Zagibalov
> University of Sussex
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
> --
> ================================================
> Adam Kilgarriff http://www.kilgarriff.co.uk
> Lexical Computing Ltd http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd http://www.lexmasterclass.com
> Universities of Leeds and Sussex adam at lexmasterclass.com
> <mailto:adam at lexmasterclass.com>
> ================================================
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list