[Corpora-List] MWE extraction from a desired text
Martin Reynaert
reynaert at uvt.nl
Sun Jan 30 15:03:11 UTC 2011
Dear Fatemeh,
Seems to me the Unix/Linux text utility 'grep' might do wonders for you.
The -f parameter would allow you to give it your offline ready list of
MWEs. Getting back only the sentences that match would require that your
corpus has undergone sentence splitting first. This is quite often
performed by tools called 'tokenizers'.
With some searching on the web you will dig up a Windows version of
'grep' and other indispensable text utility tools ('tr', 'sed', etc.) .
Welcome,
Martin Reynaert
ILK
Tilburg University
Fatemeh Torabi Asr wrote:
>
> Dears,
>
> I wonder if anyone knows a software that takes a text as input and
> outputs a list of included sentences in which common Multi Word
> Expressions (MWE) appear. I have already found some tools but the
> underlying algorithm is also important for me. I don't want the
> algorithm to work based on the frequencies in the input text but
> [probably] it should have an offline ready list of MWEs (or a similar
> data structure) based on which it parses the text. Any kind of
> idiomatic exression (unusual ones e.g., "by and large" or well-formed
> ones e.g., "break one's heart") are acceptable.
>
> Best,
> Fatemeh
>
>
>
> --
> Fatemeh
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list