[Corpora-List] querying corpora

Albretch Mueller lbrtchx at gmail.com
Fri Feb 29 16:18:47 UTC 2008


> ... the folks over at the ALLC (Association for Literary and Linguistic
> Computing) and ACH (Association for Computers and the Humanities) are
> doing.
~
 I did go to ach.org and allc.org, and bookmarked them to check them
out, but the searches I have done on specific topics have given me
very little on the sort of things I mentioned
~
 It may be caused by a comprehension artifact due to my coming from an
exact science background, but what can you do by, say, knowing that
"e" and "the" are the most used letter and word in English?
~
>  >  Are there any text corpora out there including phonemes also?
>
>  Not sure what you mean here.  Are you referring to transcriptions of
>  speech, which might include more or less free variation at the phonemic
>  level (the two pronunciations of 'roof' and 'route'), dialectal variation
>  at the phonemic level (such as whether 'pin' and 'pen' are homophones), or
>  phonemes which cannot be inferred from a pronunciation dictionary (e.g.
>  the present and past tense pronunciations of 'read')?
~
 I actually mean all these cases. If you ask a corpus "give me all
words pronounced exactly like" "right", it should give you, namely:
~
 "right" (adj.), "Wright" (English Last name (Wright Brothers)),
"rite" (noun), "write" (verb)
~
 along with the texts and offsets where they appear in the texts
~
 Or, e.g., you could study all the instances of the word "wing" in a
text corpora and its contextual usage patterns to come to the
conclusion that a phrase like:
~
 "right wing, left wing, chicken wing, ... I am political!"
~
 could be meant as a pun
~
 I am not a linguist myself, but even though I can count
semiotics/linguistics as some of my true loves and I have done quite a
bit of reading/coding on these subjects, IMHO, I think that
linguistics hasn't gone far from the times Aristotle said as a way to
somewhat measurably explain poetry in his "Poetika" that "the spring
of life ..." (referring to youth)
~
 Now that I am mentioning this I had another question. Have
linguists/literature scientists written up a wish list of the features
they expect from a corpus?
~
 Thanks
 lbrtchx

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list