[Corpora-List] Automatic categorization of words.

Dominic Widdows widdows at maya.com
Wed Mar 9 14:28:09 UTC 2005


Dear Cyrus,

Be very careful about trying to find any such fixed categories by
analyzing corpora, because you will need to do some pretty powerful
disambiguation. The examples you cite are typical - "rock" is often
used figuratively and as an abstract description for a kind of music,
"justice" is used as a title to describe an actual person. (I haven't
dug out corpus examples but this would be easy to do if you're
interested.) There is every reason to believe that this kind of
ambiguity is the rule rather than the exception, at least for
relatively common vernacular words.

Some of the examples above can be dealt with using syntactic tagging,
chunking, etc., all of which are possible using relatively standard
tools nowadays, at least for English. But it might be a lot more work
than you had in mind.

You may have considered this already, but in case you hadn't I just
wanted to raise the possibility to your attention, because just finding
a list of words that are categorized as concrete or abstract and
tagging them as such when they occur in corpora will almost certainly
give disappointing results.

Best wishes,
Dominic

On Mar 9, 2005, at 12:03 AM, Cyrus Shaoul wrote:

>
> Dear List,
>
> I have been lurking for a while, but decided to post my first question
> to the list today. I am trying to do research on the differences
> between  concrete and abstract words (ie: "rock" and "justice").
>
> Does anyone know of any research or tools related to automatically
> categorizing words into these types of categories (also called
> imageability levels) based on corpus analysis?
>
> Thanks in advance,
>
> Cyrus Shaoul
> University of Alberta
>
>



More information about the Corpora mailing list