[Corpora-List] Algorithm for hypernym chaining?

Ken Litkowski ken at clres.com
Fri Jun 25 14:50:08 UTC 2010


I think that graph algorithms (search for "algorithms for directed 
graphs") should do the trick efficiently. The key to your problem is 
that each of your entries constitutes an "edge" between "vertices". You 
compile a list of such edges and the algorithms then go through a 
depth-first search to construct the full digraph. Although these 
algorithms are NP-complete, the number of vertices (e.g., in WordNet) is 
sufficiently small that your task can be accomplished in at most a 
matter of a few minutes. WordNet has the added advantage that there are 
no cycles. (I have C++ code I'm willing to share, but it might not be 
the most efficient for your purposes.)

Adam Funk wrote:
> Sorry if this is a bit off-topic, but at least it's for an NLP
> application.  I have a file of pairs of WordNet synset numbers that
> represent hypernym-hyponym pairs, for example:
>
> 100001930,100001740
> 100002137,100001740
> 100002452,100001930
> 100002684,100001930
> 100003553,100002684
> 100003993,100003553
> 100004258,100003553
> 100004475,100004258
> 100005787,100004475
> 100005930,100004475
>
> and I want to produce all the longest possible chains of them.  The
> output for that list is as follows:
>
> 100002137,100001740
> 100002452,100001930,100001740
> 100003993,100003553,100002684,100001930,100001740
> 100005787,100004475,100004258,100003553,100002684,100001930,100001740
> 100005930,100004475,100004258,100003553,100002684,100001930,100001740
>
> I have a program that does this correctly for small samples of the data
> I'm supposed to process, but I've estimated (using a log plot) that it
> will take about 10^595 seconds to run over the whole list (about 89000
> pairs), so I think I'm doing it wrong.
>
> I haven't been able to find an algorithm for this, because I don't know
> what the computational problem is called.  Can anyone point me in the
> direction of something suitable?
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>   

-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road                     Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA       Blog: http://www.clres.com/blog


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list