[Corpora-List] (no subject)

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Sat May 28 18:47:53 UTC 2011


Hi Nigel

1. I have no idea how GoogleFight obtains its hit counts (or the relationship between these and standard Google search results).

2.  I too felt that the figures you obtained for "in breach of his standard of care" and "in breach of his duty of care" were way too high. I would expect very few 7-grams to
have such a high frequency, and certainly not the ones you selected.

3. Therefore, I did a standard Google search on these strings, and got:
"in breach of his standard of care"= About 3 results (0.26 seconds)
"in breach of his duty of care"= About 28,700 results (0.23 seconds)
[NB several legal documents relating to Scots law on 1st page of hits]
[In the process, I also spotted: http://en.wikipedia.org/wiki/Breach_of_duty_in_English_law]

4. This made me re-examine your email:  "My law students tend to mix up standard and breach, so I keyed in "in breach of his duty of care" vs (in legalese basically erroneous) "in breach of his standard of care". Yet GoogleFight manages to make this a majority decision: 523,000 to 221,000."
a) Did you mean that your students mix up 'standard of care' and 'duty of care' (rather than "standard and breach")?
b) Which phrase had 523,000 and which 221,000?

5. As someone with no specialist legal knowledge at all, but substantial experience of English corpus analysis and lexicography, I would expect in Google (i.e. general language) counts :
a) 'standard' to be more frequent than 'duty'
a) 'standard of care' to be more  frequent than 'duty of care'. For example, I could happily talk about the 'standard of care' I received when I broke my arm
recently, but would be wary of talking about 'duty of care', because that sounds like a legal phrase.

6. I think you have complicated the issue in the exact wordings of your search items, because:
a) for me "breach+duty" is a strong collocation, whereas I find it much more difficult to accept "breach+standard"
b) I would see 'breach' as the main signal of legal domain in these phrases

7. Anyway, my curiosity led me to do a few more standard Google [and corpus] searches:

standard= About 1,470,000,000 results (0.15 seconds)
duty= About 437,000,000 results (0.10 seconds)

[http://corpus.byu.edu/: BNC; COCA
standard=12659(+breach=1), 40504(+breach=6)
duty=7861(+breach=324), 14986 (+breach=52)]

"standard of care"= About 7,760,000 results (0.13 seconds)
"duty of care"= About 1,620,000 results (0.11 seconds)

breach+standard+care= About 16,600,000 results (0.04 seconds)
breach+duty+care= About 20,700,000 results (0.12 seconds)

hope this helps...

Ramesh Krishnamurthy
Visiting Academic Fellow, School of Languages and Social Sciences, Aston University, Birmingham B4 7ET
Room: NX01. Tel: 0121-204-3812.
Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/
Project Investigator, GeWiss (Volkswagen Foundation) project: http://www1.aston.ac.uk/lss/research/research-projects/gewiss-spoken-academic-discourse/


Date: Mon, 23 May 2011 19:21:39 +0800

From: njbruce <njbruce at hku.hk<mailto:njbruce at hku.hk>>

Subject: [Corpora-List] Google Fight  - database & authenticity?

To: mtmarathon2011 <mtmarathon2011 at fbk.eu<mailto:mtmarathon2011 at fbk.eu>>, "corpora at uib.no<mailto:corpora at uib.no>"

      <corpora at uib.no<mailto:corpora at uib.no>>

Cc: Ondrej Bojar <bojar at ufal.mff.cuni.cz<mailto:bojar at ufal.mff.cuni.cz>>



Can anyone tell me how GoogleFight comes up with colossal numbers for even highly discipline-specific expressions? My law students tend to mix up standard and breach, so I keyed in "in breach of his duty of care" vs (in legalese basically erroneous) "in breach of his standard of care". Yet GoogleFight manages to make this a majority decision: 523,000 to 221,000. When I use my 2 million word discipline specific (UK case report) corpus, I get zero matches for the erroneous form.

Having said that, the 2nd line of the Wikipedia entry on "Standard of care" has the line :"Whether the standard of care has been breached is determined by ... etc." - so this is an easy slip to make. But 221,000 entries?!  I thought about including a fun link to GoogleFight for my ESL Law students to play around with, but am now wondering how useful that might be.

Any suggestions/insights welcome.

Nigel Bruce

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110528/9e482177/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list