[Corpora-List] Algorithm for hypernym chaining?
Eric Ringger
ringger at cs.byu.edu
Fri Jun 25 15:00:08 UTC 2010
Hi, Ken & Adam.
DFS is O(|V| + |E|). For a fully connected graph (or nearly so), O(|E|) =
O(|V|^2). Thus DFS is worst-case O(|V|^2), which is certainly in P (not in
NP or NP-Complete). If you start your DFS at what appears to be your "root"
node (100001740), then you will get what you wanted. Am I missing
something?
Regards,
--Eric
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Ken Litkowski
Sent: Friday, June 25, 2010 8:50 AM
To: Adam Funk
Cc: corpora mailing list
Subject: Re: [Corpora-List] Algorithm for hypernym chaining?
I think that graph algorithms (search for "algorithms for directed
graphs") should do the trick efficiently. The key to your problem is
that each of your entries constitutes an "edge" between "vertices". You
compile a list of such edges and the algorithms then go through a
depth-first search to construct the full digraph. Although these
algorithms are NP-complete, the number of vertices (e.g., in WordNet) is
sufficiently small that your task can be accomplished in at most a
matter of a few minutes. WordNet has the added advantage that there are
no cycles. (I have C++ code I'm willing to share, but it might not be
the most efficient for your purposes.)
Adam Funk wrote:
> Sorry if this is a bit off-topic, but at least it's for an NLP
> application. I have a file of pairs of WordNet synset numbers that
> represent hypernym-hyponym pairs, for example:
>
> 100001930,100001740
> 100002137,100001740
> 100002452,100001930
> 100002684,100001930
> 100003553,100002684
> 100003993,100003553
> 100004258,100003553
> 100004475,100004258
> 100005787,100004475
> 100005930,100004475
>
> and I want to produce all the longest possible chains of them. The
> output for that list is as follows:
>
> 100002137,100001740
> 100002452,100001930,100001740
> 100003993,100003553,100002684,100001930,100001740
> 100005787,100004475,100004258,100003553,100002684,100001930,100001740
> 100005930,100004475,100004258,100003553,100002684,100001930,100001740
>
> I have a program that does this correctly for small samples of the data
> I'm supposed to process, but I've estimated (using a log plot) that it
> will take about 10^595 seconds to run over the whole list (about 89000
> pairs), so I think I'm doing it wrong.
>
> I haven't been able to find an algorithm for this, because I don't know
> what the computational problem is called. Can anyone point me in the
> direction of something suitable?
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
--
Ken Litkowski TEL.: 301-482-0237
CL Research EMAIL: ken at clres.com
9208 Gue Road Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list