[Corpora-List] looking for natural language questions on computer science publication domain

Lushan Han lushan1 at umbc.edu
Tue Aug 27 14:11:25 UTC 2013


Dear Corpora List,

We are developing a question-answering system on a publication dataset
combining data from DBLP, CiteSeerX and ArnetMiner. Our system is now able
to interpret many interesting questions from simple ones, like “who
published papers on the CIKM conference in 2009” or “give me papers in the
subject decision trees”, to complicate ones like “give me the institutions
of the authors with whom Lushan Han at UMBC has co-authored” or “list
papers that are cited by papers in the conference SIGMOD in the year 2012”.

However, we need a dataset containing user queries to evaluate our system
and set its parameters. We expect the queries are in the form of natural
language questions. We have made some by ourselves but we still need more
questions and, especially, more rephrases. A question can be expressed in
many different ways, which a QA system has to deal with. For example, the
citation relation can be queried using “give me paper y that cites the
paper x” or “give me paper y that references the paper x” or “give me the
citations of paper x” or “give me the references of publication y”.
 Moreover, we can also ask “who cites the paper x” in which the citation is
no longer a direct relation between papers.

Does anyone know the existing of such datasets? Any is good because it can
help by adding more variations either in the content or expression to the
questions in our dataset. Your help is highly appreciated.


Thanks,

Lushan Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130827/25a414ca/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list