[Corpora-List] MWE extraction from a desired text
Rich Cooper
rich at englishlogickernel.com
Sun Jan 30 21:09:06 UTC 2011
Hi Fatmeh,
Are you interested in developing a corpus of issued patents as recorded by
the USPTO, which contain numerous large columns with unstructured text? I
have tools that will help you do that.
If you have done so, you can then use another tool (in alpha condition now)
called Linguistics Lab which text mines for exactly such MWEs in string
format.
Would that help you?
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
_____
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Fatemeh Torabi Asr
Sent: Sunday, January 30, 2011 5:39 AM
To: corpora at uib.no
Subject: [Corpora-List] MWE extraction from a desired text
Dears,
I wonder if anyone knows a software that takes a text as input and outputs a
list of included sentences in which common Multi Word Expressions (MWE)
appear. I have already found some tools but the underlying algorithm is also
important for me. I don't want the algorithm to work based on the
frequencies in the input text but [probably] it should have an offline ready
list of MWEs (or a similar data structure) based on which it parses the
text. Any kind of idiomatic exression (unusual ones e.g., "by and large" or
well-formed ones e.g., "break one's heart") are acceptable.
Best,
Fatemeh
--
Fatemeh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110130/72fce7ef/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list