[Corpora-List] corpus of mathematical equations

Dom Widdows widdows at google.com
Thu Jan 10 16:34:53 UTC 2008


I believe that arXiv.org still asks for TeX source.
Be careful if you think about spidering their site, though, they take a
*very* dim view of blind robots (http://arxiv.org/RobotsBeware.html).

-Dominic

On Jan 10, 2008 11:25 AM, Jason Eisner <jason at cs.jhu.edu> wrote:

> Here is a small corpus of automatically generated formal mathematical
> proofs paired with their "verbalizations" into English (I believe):
>   http://www.cs.cornell.edu/Info/Projects/NuPrl/html/nlp/
>
> Also, you might be able to get a corpus of papers that contain TeX
> equations, if the TeX markup language itself constitutes sufficient
> markup for your purposes.  (It reveals the recursive subconstituents
> of a formula, although it doesn't attach any semantics to them.  So
> it's certainly a lot more informative than an scanned image of an
> equation!)  For example, the digital library at arXiv.org used to ask
> authors to submit their original TeX / LaTeX / AMSTeX files when
> adding a paper.
>
> -cheers, jason
>
> On Jan 10, 2008 9:07 AM, Mary Hearne <mhearne at computing.dcu.ie> wrote:
> > Hi all,
> >
> > on behalf of my colleague, Dónal Fitzpatrick:
> >
> > Do you know of any kind of corpus of mathematical equations where the
> constituent parts are tagged
> > in any meaningful way?  I am uncertain as to:
> > 1.  How the parts of an equation could be tagged
> > or
> > 2.  whether this has been done before.
> >
> > If you would like to contact him directly, Dónal's e-mail address is
> dfitzpat at computing.dcu.ie.
> >
> > Best regards,
> > Mary Hearne
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080110/0d5aa1c3/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list