[Corpora-List] Multiword Expressions

Lars Aronsson lars at aronsson.se
Wed Jan 14 13:41:07 UTC 2004


Anna Korhonen wrote:

> CALL FOR PAPERS
> [...]
> In recent years, there has been a growing awareness in the NLP community
> of the problems that Multiword Expressions (MWEs) pose and the need for
> their robust handling.
>
> MWEs  include a large range of linguistic phenomena,  such as phrasal verbs
> (e.g. "add up"), nominal compounds (e.g. "telephone box"),  and
> institutionalized phrases (e.g. "salt and pepper").  These expressions,
> which can be syntactically and/or semantically idiosyncratic in nature,
> are used frequently in everyday language, usually to express  precisely
> ideas and concepts that cannot be compressed into a single word.

I'm not a linguist, and didn't know there was a word for MWEs until
now.

Is there any freely available open source software for spell checking
(or natural language parsing) that handles multiword expressions?  I
want an algorithm that can approve "nota bene", "ad notam" and "San
Francisco" (if these MWEs are in the dictionary) in an English text
without approving the member words on their own.

Free software such as ispell, aspell, myspell don't seem to have this
ability.  They seem to handle splitting the input text into words
entirely separately from what's in the dictionary.

Dictionary-driven parsing could also be useful for abbreviations,
hyphenation, and SillyCapitalization.


--
  Lars Aronsson (lars at aronsson.se)
  Aronsson Datateknik - http://aronsson.se/



More information about the Corpora mailing list