[Corpora-List] MWE extraction from a desired text

Rich Cooper rich at englishlogickernel.com
Sun Jan 30 21:09:06 UTC 2011


Hi Fatmeh,

 

Are you interested in developing a corpus of issued patents as recorded by
the USPTO, which contain numerous large columns with unstructured text?  I
have tools that will help you do that.  

 

If you have done so, you can then use another tool (in alpha condition now)
called Linguistics Lab which text mines for exactly such MWEs in string
format.  

 

Would that help you?

 

-Rich

 

 

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

  _____  

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Fatemeh Torabi Asr
Sent: Sunday, January 30, 2011 5:39 AM
To: corpora at uib.no
Subject: [Corpora-List] MWE extraction from a desired text

 


Dears,

I wonder if anyone knows a software that takes a text as input and outputs a
list of included sentences in which common Multi Word Expressions (MWE)
appear. I have already found some tools but the underlying algorithm is also
important for me. I don't want the algorithm to work based on the
frequencies in the input text but [probably] it should have an offline ready
list of MWEs (or a similar data structure) based on which it parses the
text. Any kind of idiomatic exression (unusual ones e.g., "by and large" or
well-formed ones e.g., "break one's heart") are acceptable.

Best,
Fatemeh




-- 
Fatemeh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110130/72fce7ef/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list