[An-lang] Austronesian corpus linguistics

Byron W. Bender bender at hawaii.edu
Mon May 9 22:14:16 UTC 2005


Keira
  Just a note in response to one point in your message, the difficulty of
searching PDFs. For those who pay the tariff for Adobe Acrobat 7.0, it has a
rather powerful search engine that will lay out all occurrences of a word in
a PDF corpus concondance fashion, giving some of the context, and with a
hyperlink to the actual occurrence in the text, so that if you click on the
corcordance item, the manuscript will scroll to the actual page. Like other
Adobe products, it's not cheap, but an educational version is available at
univ. bookstores to students and faculty. I forget exactly what I paid at
the UH Bookstore, but I'm pretty sure it was less than $300, maybe less than
$200 for the educ. version.
  There may be another bonus. Adobe Acrobat 7.0 will save PDFs in a wide
variety of other formats, including XML, which I gather is the choice of
archivists nowadays. I haven't had time to experiment with XML myself, but I
thought I should at least point out this possibility.
Byron

Byron W. Bender, Editor, Oceanic Linguistics (808)956-8374
Department of Linguistics, University of Hawai'i         fax  -9166
1890 East-West Road, Honolulu, HI 96822-2318
http://www2.hawaii.edu/~bender/


-----Original Message-----
From: an-lang-bounces at anu.edu.au [mailto:an-lang-bounces at anu.edu.au] On
Behalf Of Keira Ballantyne
Sent: Monday, May 09, 2005 9:18 AM
To: an-lang at anu.edu.au
Subject: [An-lang] Austronesian corpus linguistics

Dear AN-Langers,

For my recently-completed dissertation project, I compiled a small corpus
(approx. 7 000 words) of Yapese texts. The corpus includes both written
narrative and spoken text, and is available on the web at
http://www2.hawaii.edu/~ballanty/corpusintro.html

My texts are currently presented as pdf files, which preserve the
"viewability" of interlinear glosses but are somewhat lacking in terms of
searchability, as well as being problematic from an archival point of view.

I am currently looking at various ways of tagging this corpus so that it
will be more accessible to other scholars who would be interested in using
it as a research tool, and I'd like to gauge the interest of the research
community. What sort of searchability in a corpus would be useful to your
research?

I would also be interested to hear from other researchers who have done
similar work -researchers who have or are currently compiling corpora of
Austronesian languages. I'm particularly interested in hearing from scholars
who have constructed tagged and/or interlinearized databases and who would
be interested in sharing their experience and knowledge.

Regards,

Keira Ballantyne
Department of Linguistics
University of Hawai'i at Manoa
ballanty at hawaii.edu
keiraballantyne at gmail.com

_______________________________________________
An-lang mailing list
An-lang at anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/an-lang

_______________________________________________
An-lang mailing list
An-lang at anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/an-lang



More information about the An-lang mailing list