[Lingtyp] Examplology: imtvault.org

Kofi Yakpo kofi at hku.hk
Thu Mar 24 10:28:48 UTC 2022

Thanks Sebastian,

Another stupendous initiative of our favourite linguistics press.

Dr Kofi Yakpo • Associate Professor
Chair of Linguistics <http://www.linguistics.hku.hk/> • University of Hong
Kong <http://arts.hku.hk/>
My publications @ zenodo
<https://zenodo.org/search?page=1&size=20&q=yakpo&sort=-publication_date> •
 My Page <http://hub.hku.hk/cris/rp/rp01715>

Just published:
Creole prosodic systems are areal, not simple
Social entrenchment influences the amount of areal borrowing
Unidirectional multilingual convergence
Two types of language contact involving English Creoles

On Thu, Mar 24, 2022 at 3:15 PM Sebastian Nordhoff <
sebastian.nordhoff at glottotopia.de> wrote:

> Dear list members,
> there has been some discussion about "hit", "kill", "John", "Mary", and
> other usual suspects. Over the past months, we have worked on a corpus
> of all examples found in Language Science Press books. This corpus is
> now available in a beta version at imtvault.org. It contains 40648
> interlinear examples from 124 different languages and can be filtered
> along various criteria. For instance, we can search for John, Mary, or
> Peter.
> https://imtvault.org/?q=John: 266 hits
> https://imtvault.org/?q=Mary: 223 hits
> https://imtvault.org/?q=Peter: 232 hits
> We can look into the popularity of certain verbs:
> https://imtvault.org/?q=hit: 399 hits
> https://imtvault.org/?q=kill: 440 hits
> https://imtvault.org/?q=love: 181 hits
> https://imtvault.org/?q=kiss: 26 hits
> https://imtvault.org/?q=carry: 235 hits
> We have also retrieved semantic categories, so you get
> https://imtvault.org/?parententities[0]=Crop
> which gives you examples about tobacco, rice, barley, wheat and so on.
> Other categories which might be interesting:
> https://imtvault.org/?parententities[0]=Weapon: 89 hits
> https://imtvault.org/?parententities[0]=Hazard: 205 hits
> You can also filter for grammatical categories. In the examples in the
> corpus, 2808 have a plural morpheme in them, while 2116 have a singular
> morpheme. Accusative (1937) is more popular than genitive (1601), dative
> (1309) or nominative (1232).
> The content of the corpus is obviously skewed by the following criteria:
> 1) The coverage of the input books. Australia for instance is severely
> underrepresented.
> 2) The length of the input books. "A grammar of Japhug" is 1600 pages,
> so you are likely to get a lot of Japhug grammatical categories.
> 3) The source code of the books. We extract the examples from the tex
> files used to generate the pdf, and assume certain conventions. If a
> book author does not follow these conventions, we are not able to
> retrieve the examples.
> All this means that the corpus, despite its size, is still
> opportunistic. But it can maybe trigger some interesting ideas, which
> can be pursued further by a more systematic approach. We are also
> working on making the data available for machine queries so that you can
> import the corpus into R or similar and run your own statistics.
> There are still some rough edges here and there, but we will be working
> on ironing them out. If you have any suggestions or feature requests,
> feel free to contact me.
> Best wishes
> Sebastian (also on behalf of Thomas Krämer)
> PS: If you are wondering about the high frequency of Greek philosophers,
> they are all from our translation of Wackernagel's "On a law of
> Indo-European word order"
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20220324/00763041/attachment.htm>

More information about the Lingtyp mailing list