[Corpora-List] Phrase extraction

Antti Arppe aarppe at ling.helsinki.fi
Tue Oct 25 09:32:49 UTC 2005


On Mon, 24 Oct 2005, Helge Thomas Karset Hellerud wrote:
> PoS (Part of Speech) tagging is often used to extract phrases from text
> (like Noun Phrases). But that approach assumes you have a PoS tagger
> available. My document collection is in Norwegian, but I don't have a
> Norwegian tagger.
>
> 1) Is there a way to create a simple PoS tagger to recognize verbs,
> nouns and adjectives (in Norwegian)?

Before creating your own tagger, have you or your department 
considered getting/licensing Multitagger (a PoS tagger for Norwegian 
created by the Universitetet i Oslo / Textlaboratoriet / Janne Bondi 
Johannessen) or an academic version of Connexor's dependency parser 
(Machinese) for Norwegian?

 	-Antti Arppe



More information about the Corpora mailing list