[Corpora-List] querying corpora

Michael Maxwell maxwell at umiacs.umd.edu
Fri Feb 29 14:52:35 UTC 2008


>  I was wondering about the kinds of queries you may run on open
> corpora out there
> ...
>  Could you, say, run a query asking a corpus to give you the result
> about how many times, where in a sentence (both, as a distribution of
> the number of words, the POS elements used in them and the texts as a
> whole) did Shakespeare use words related to "love" (which you should
> be also able to query even with a certain level of "measurable
> relatedness") modified by an adverb and containing also an adjective
> within the sentence?

In addition to the responses you get from this list, you might look into
what the folks over at the ALLC (Association for Literary and Linguistic
Computing) and ACH (Association for Computers and the Humanities) are
doing.  That strikes me as the sort of topic they would be interested in.

>  Are there any text corpora out there including phonemes also?

Not sure what you mean here.  Are you referring to transcriptions of
speech, which might include more or less free variation at the phonemic
level (the two pronunciations of 'roof' and 'route'), dialectal variation
at the phonemic level (such as whether 'pin' and 'pen' are homophones), or
phonemes which cannot be inferred from a pronunciation dictionary (e.g.
the present and past tense pronunciations of 'read')?

   Mike Maxwell
   CASL/ U MD


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list