[Corpora-List] Google Fight - database & authenticity?
njbruce
njbruce at hku.hk
Mon May 23 11:21:39 UTC 2011
Can anyone tell me how GoogleFight comes up with colossal numbers for even highly discipline-specific expressions? My law students tend to mix up standard and breach, so I keyed in "in breach of his duty of care" vs (in legalese basically erroneous) "in breach of his standard of care". Yet GoogleFight manages to make this a majority decision: 523,000 to 221,000. When I use my 2 million word discipline specific (UK case report) corpus, I get zero matches for the erroneous form.
Having said that, the 2nd line of the Wikipedia entry on "Standard of care" has the line :"Whether the standard of care has been breached is determined by ... etc." - so this is an easy slip to make. But 221,000 entries?! I thought about including a fun link to GoogleFight for my ESL Law students to play around with, but am now wondering how useful that might be.
Any suggestions/insights welcome.
Nigel Bruce
________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Nicola Bertoldi [bertoldi at fbk.eu]
Sent: Monday, May 23, 2011 3:46 PM
To: corpora at uib.no
Cc: Ondrej Bojar
Subject: [Corpora-List] Call for Papers for the Open Source Convention held at the Sixth MT Marathon
(Apologies if you receive multiple copies. Please, distribute it among potentially interested colleagues.)
CALL FOR PAPERS:
OPEN SOURCE TOOLS FOR MACHINE TRANSLATION
The Machine Translation Marathon 2011 is the sixth in a series of events promoted by EuroMatrix and EuroMatrixPlus, which are EU research projects on Machine Translation. The MT Marathon will take place 5-10 September 2011 in Trento, Italy, organised by the HLT Research Unit of Fondazione Bruno Kessler (FBK).
For more information on the MT Marathon go to the official website: `http://mtmarathon2011.fbk.eu
The MT Marathon is hosting an Open Source Convention to advance the state of the art in machine translation. We invite developers of open source tools to present their work and submit a paper of up to 10 pages that describes the underlying methodology and includes instructions on how to use the tools.
We are looking for stand-alone tools and extensions of existing tools, such as the Moses open source system. Accepted papers will be presented during the MT Marathon and published in the Prague Bulletin of Mathematical Linguistics (http://ufal.mff.cuni.cz/pbml).
Possible Topics:
* Training of Machine Translation models
* Machine Translation decoders
* Tuning of Machine Translation systems
* Evaluation of Machine Translation
* Visualisation, annotation or debugging tools
* Tools for human translators
* Interfaces for web-based services or APIs
* Extensions of existing tools
* Other tools for Machine Translation
This is the fourth time that the MT Marathon will host the Open Source Convention. The papers from last three marathons are available online (http://ufal.mff.cuni.cz/pbml-91-100.html).
Papers will be reviewed by two reviewers appointed by the program committee. Most of the accepted papers will be printed in PBML in time for the MT Marathon, some papers may require substantial revisions and may be postponed to subsequent PBML issues.
Important Dates
Abstract submission: July 10, 2011 (1 paragraph, to help us allocate reviewers)
Paper submission: July 24, 2011
Notification of acceptance: August 3, 2011
Camera-ready: August 10, 2011
Presentations: 5-10 settembre 2011 (at the MT Marathon in Trento)
Author Instructions
Please send full non-anonymous submissions in PDF to Philip Koehn (pkoehn AT inf DOT ed DOT ac DOT uk) and the full Xe(La)TeX source for technical pre-review to Ondrej Bojar (bojar AT ufal DOT mff DOT cuni DTO cz).
The maximum length for submissions is 10 pages, including references. This limit will be strictly enforced. If your paper has been accepted, please send your camera-ready version in both PDF and Xe(La)TeX format to Ondřej Bojar.
Submissions will be accepted only in the PBML Xe(La)TeX format (http://ufal.mff.cuni.cz/pbml-instructions.html) for short papers (i.e. MS Word and other formats or a PDF without source files will not be accepted).
Best regards,
The program committee.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list