[Corpora-List] portability studies?
Adam Kilgarriff
adam at lexmasterclass.com
Thu Jul 3 17:03:00 UTC 2008
Taras,
the prior question is: "how close is one domain or genre is to another".
(Presumably, porting costs vary with distance between domains/genres, so an
account of porting costs without an account of text type similarity tells us
nothing.)
But it's not a question that has had much scientific exploration. Only
viable approach I know of is based on comparing corpus frequencies of words
or other constructs - see eg contrastive accounts of web corpora in recent
work by Sharoff, Ferraresi and others
Adam
Sharoff, S (2006) Creating general-purpose corpora using automated search
engine queries.<http://corpus.leeds.ac.uk/serge/publications/wacky-paper.pdf>In
M. Baroni, S. Bernardini (eds.)
*WaCky! Working papers on the Web as Corpus*, Bologna, 2006.
A. Ferraresi, E. Zanchetta, M. Baroni and S. Bernardini. 2008. Introducing
and evaluating ukWaC, a very large Web-derived corpus of
English<http://clic.cimec.unitn.it/marco/publications/lrec2008/lrec08-ukwac.pdf>.
In Proceedings of the WAC4 Workshop at LREC 2008.
2008/7/3 Taras Zagibalov <T.Zagibalov at sussex.ac.uk>:
> Dear colleagues,
> I've been trying to study the problem of NLP systems' portability, but
> failed to find any paper covering the subject. Could you please advise
> of any source of information that describes the problem of porting an
> NLP system to different domains/genres/languages and provides some
> metrics that measures how much it takes (of time, labour, resources...)
> to port a system.
> Thank you in advance.
>
> Taras Zagibalov
> University of Sussex
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
--
================================================
Adam Kilgarriff http://www.kilgarriff.co.uk
Lexical Computing Ltd http://www.sketchengine.co.uk
Lexicography MasterClass Ltd http://www.lexmasterclass.com
Universities of Leeds and Sussex adam at lexmasterclass.com
================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080703/b50920a7/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list