[Corpora-List] PS:minimal changes in a paragraph (based on a corpus it appeared) ... (2nd attempt (after first one was deleted))

Rich Cooper rich at englishlogickernel.com
Tue Aug 9 17:42:46 UTC 2011


Using corpus analysis techniques - identifying
context words and selecting sentences that have a
high score for content words divided by the
sentence word count - works very well for me.  

 

I'm sure there are deeper metrics for similarity,
but that approach is consistent with what is
recommended by Mike Scott and Christopher Tribble
in "Textual Patterns", John Benjamins Pub Co,
Studies in Corpus Linguistics, 2006, pp 55-72.  

 

I use a set of frequent words to null out everyday
language usage, and what is left I call rare
words.  I use the rare words to designate content
words that participate in the count.  

 

Given a patent claim, the rare words are those I
use to match with each sentence in the patent
specification, and the higher the incidence of a
claim element content set matching a sentence
content vocabulary is what should lead to a high
matching score.  I filter out the smallest scores
and select the 6 - 12 highest depending on user
settings for how many sentences to keep in the
answer set.  

 

It seems to work very well with patents.  

 

-Rich

 

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

  _____  

From: corpora-bounces at uib.no
[mailto:corpora-bounces at uib.no] On Behalf Of Angus
Grieve-Smith
Sent: Tuesday, August 09, 2011 10:31 AM
To: corpora at uib.no
Subject: Re: [Corpora-List] PS:minimal changes in
a paragraph (based on a corpus it appeared) ...
(2nd attempt (after first one was deleted))

 

On 8/9/2011 12:35 PM, Bill Louw wrote: 


We need to find out if our discussion has assisted
Albrecht ... Best wishes, Bill


    I have to admit that I didn't understand what
Albretch wanted:




Say, you have a certain paragraph belonging to a
text and relating to

the other paragraphs of that same text and to
other ones of other

texts and you want to generate similar paragraphs.


    When you say "similar," Albretch, you mean a
paragraph that relates to the other paragraphs in
similar ways?  Because there are all kinds of ways
for paragraphs to be similar.



-- 
                               -Angus B.
Grieve-Smith
                               grvsmth at panix.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110809/dff3b9d1/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list