[An-lang] Austronesian corpus linguistics

Simon Greenhill ghill at ihug.co.nz
Mon May 9 23:08:33 UTC 2005


Morning all,

There are a number of ways to search PDF files, but this depends on a
number of factors such as what the pdf was generated from (text or
images), and the digital rights management systems in place.

For a quick personal solution, you could look into using Google's
Desktop Search, which is a free search engine for your local files, and
will index pdfs. You can download it for PCs here:
http://desktop.google.com/

Apple users can probably use the new Spotlight application released with
the Tiger. More info: http://www.apple.com/macosx/features/spotlight/

If you're looking into a more portable solution, then it's usually
fairly easy to set up a content indexing system and attach it to a
webpage using something like SWISH-E http://swish-e.org/index.html . At
  the very worst, you could use a simple application like PDF2HTML
(http://sourceforge.net/projects/pdftohtml/) to generate html copies of
the pdfs.

--Simon

Keira Ballantyne wrote:
> Dear AN-Langers,
>
> For my recently-completed dissertation project, I compiled a small
> corpus (approx. 7 000 words) of Yapese texts. The corpus includes both
> written narrative and spoken text, and is available on the web at
> http://www2.hawaii.edu/~ballanty/corpusintro.html
>
> My texts are currently presented as pdf files, which preserve the
> "viewability" of interlinear glosses but are somewhat lacking in terms
> of searchability, as well as being problematic from an archival point
> of view.
>
> I am currently looking at various ways of tagging this corpus so that
> it will be more accessible to other scholars who would be interested
> in using it as a research tool, and I'd like to gauge the interest of
> the research community. What sort of searchability in a corpus would
> be useful to your research?
>
> I would also be interested to hear from other researchers who have
> done similar work –researchers who have or are currently compiling
> corpora of Austronesian languages. I'm particularly interested in
> hearing from scholars who have constructed tagged and/or
> interlinearized databases and who would be interested in sharing their
> experience and knowledge.
>
> Regards,
>
> Keira Ballantyne
> Department of Linguistics
> University of Hawai'i at Manoa
> ballanty at hawaii.edu
> keiraballantyne at gmail.com
>
> _______________________________________________
> An-lang mailing list
> An-lang at anu.edu.au
> http://mailman.anu.edu.au/mailman/listinfo/an-lang
>
>
_______________________________________________
An-lang mailing list
An-lang at anu.edu.au
http://mailman.anu.edu.au/mailman/listinfo/an-lang



More information about the An-lang mailing list