[Corpora-List] sentence detector and phrase chunker returning absolute positions in text

Nils Reiter reiter at cl.uni-heidelberg.de
Mon Jul 19 09:03:56 UTC 2010


Dear Wiebke,

On 19.07.2010, at 07:58, Wiebke Wagner wrote:
> I am looking for a tool that performs sentece detection, part-of-speech
> tagging and phrase-chunking. My problem is that most of these tools
> return annotated text. What I need, however, is the absolute positions
> in text of the sentece boundaries and of the chunks. For example,
> consider the following sentences:
> 
> "This is a sentence. And here is another one."
> 
> I would need the information that the 19th and respectivly the 44th
> character in the text is a sentence boundary. For the chunks, the
> position and the length of the chunk would be ideal.
> I have checked OpenNLP, Gate, LingPipe and MontyLingua but did not find
> any information about such an output (at leas not for sentences AND
> chunks).
> Is anyone aware of such a tool? 

At least for sentence splitting, MorphAdorner can do that (http://morphadorner.northwestern.edu/). There's a post-processing method that gives you sentence boundaries in character offsets. 

Best
Nils
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list