[Corpora-List] Incidence of MWEs

Piao, Songlin s.piao at lancaster.ac.uk
Wed Mar 15 17:30:22 UTC 2006


Things may get worse when translators or MT systems when they come aross MWEs like "stick sharpener". I don't see how existing bilingual dictionaries alone can help them to understand that the sharpener is for sharpening knives rather than sticks, which may result in completely different translations in another language. If such mis-interpretation happens and a reader of the translation doesn't speak English, there would be no way for the reader to realise the problem. This person may end up ordering hundreds of files when he/she actualy wants to buy knives ...

Correct identification and interpretation of such MWEs may become vital in such cases.

Scott Piao
-----------------
Computing Department
Lancaster University
Lancaster LA1 4WA
UK


 

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On Behalf Of Will Fitzgerald
Sent: 15 March 2006 16:41
To: Amsler, Robert
Cc: Corpora List
Subject: Re: [Corpora-List] Incidence of MWEs

The thing is, the various meanings of 'pencil sharpener', 'crayon sharpener' and 'stick sharpener' are all predictable, just not from their immediate lexical items. I think that any 'tool for Verbing Noun' or a 'tool for Verbing, shaped like a Noun' will apply in Noun Verb-er expressions. Certainly, because there is a greater need for pencil sharpeners, pencil sharpeners tend to have standard shapes & components, but a pencil sharpener that worked via laser beams would still be a pencil sharpener. And imagine a tool for sharpening knives that had a graphite core; in the proper context, 'pencil sharpener'
(or maybe even 'pencil knife sharpener' is ok.

The point is that general real-world knowledge, plus rules of phrasal combination, create predictable meanings for some expressions that are not predicatable based on the lexical meanings.

Oh, by the way, here is a 'pencil pencil sharpener':
<http://www.shop-eds.com/ProductDetail.aspx?prntdid=1810&did=1828&pid=23623>



On 3/15/06, Amsler, Robert <Robert.Amsler at hq.doe.gov> wrote:
> I have found published dictionary's judgments as to what constitute 
> MWEs to be both dated and biased against declaring MWEs to exist. 
> Until I actually went through a number of texts to extract MWEs by 
> hand and compared those MWEs I found against those listed in 
> dictionaries I used to think the lexicographic coverage was adequate 
> and followed the rule that "if you can predict its meaning from its 
> constituent parts, it doesn't need a separate entry" to be correct. 
> What I found was that not only didn't the rule seem to be applied 
> consistently, but that MWEs appeared to be a much neglected area of 
> lexicography with many more undocumented MWEs being used in text than 
> were in the dictionaries. It was as though dictionaries reviewed their 
> MWE entries far less often and less diligently than they did their isolated word entries.
>
> There are probably good reasons against dictionary publishers 
> declaring MWEs to exist. Namely, MWEs greatly increase the size of a 
> dictionary for a small gain in clarity, perhaps only useful to 
> Speakers of English as a Foreign Language (and practitioners of 
> computational linguists, information retrieval and artificial intelligence). The "prediction"
> rule used to discount MWEs needing entries seems to beg the question 
> of what algorithm can predict these and what does that algorithm predict.
> There is a big difference between believing you are excluding MWEs 
> because they are understandable without definitions and having an 
> algorithm that can generate the definition you would have written from 
> the separate dictionary entries for the component words.
>
> Take an MWE such as "pencil sharpener". Most dictionaries don't define 
> this since according to the prediction rule, it could be assumed to be 
> just "a sharpener for pencils". However, that denies the fact that we 
> all know pencil sharpeners are a specific category of manufactured 
> product and if you look for a photo of a pencil sharpener it will have 
> one of several distinct models. We also know details about how pencil 
> sharpener's work. In contrast, things like a "stick sharpener" or a 
> "crayon sharpener" are novel creations without long-standing precedent 
> (I just checked the web, and, sigh, they both exist, but a "stick 
> sharpener" isn't a tool for sharpening sticks, it is a knife sharpener 
> whose shape resembles a stick, i.e., a thin cylindrical file.")
>
> A pencil sharpener would be something like "an electrical, mechanical 
> or manual device with sharpened blades into which pencils can be 
> inserted and which when operated creates a tapered conical pointed tip 
> on the pencil which initializes or renews its ability to be used as a 
> writing implement"
>
> Here is where I would say computational linguistics has to take its 
> leave of lexicography (or at least published lexicographic practice) 
> and declare "pencil sharpener" to be a useful and necessary MWE. I 
> would even go so far as to say that every MWE for which an explicit 
> definition can be written, should have an explicit definition and that 
> ONLY when the explicit definitions show no differentiation should they 
> be eliminated in favor of entries for the separate word elements. That 
> is, REVERSE the "prediction" rule to assume you cannot predict the 
> meaning of an MWE until you fail to find anything to say in its 
> definition that is not formulaic.
>
> I don't believe published dictionaries contain sufficient information 
> to correctly understand the MWEs they fail to explicitly list. I don't 
> believe published dictionaries actually think about MWEs consistently 
> or conscientiously.
>
>
>
>
>
>
>
>
>


--
Will Fitzgerald
weblog: <http://www.entish.org/willwhim>



More information about the Corpora mailing list