[Lingtyp] Examplology: imtvault.org
Kofi Yakpo
kofi at hku.hk
Thu Mar 24 10:28:48 UTC 2022
Thanks Sebastian,
Another stupendous initiative of our favourite linguistics press.
best,
kofi
————
Dr Kofi Yakpo • Associate Professor
Chair of Linguistics <http://www.linguistics.hku.hk/> • University of Hong
Kong <http://arts.hku.hk/>
My publications @ zenodo
<https://zenodo.org/search?page=1&size=20&q=yakpo&sort=-publication_date> •
My Page <http://hub.hku.hk/cris/rp/rp01715>
Just published:
Creole prosodic systems are areal, not simple
<https://doi.org/10.3389/fpsyg.2021.690593>
Social entrenchment influences the amount of areal borrowing
<https://journals.sagepub.com/doi/full/10.1177/13670069211019126>
Unidirectional multilingual convergence
<https://doi.org/10.1177/13670069211019126>
Two types of language contact involving English Creoles
<https://www.cambridge.org/core/journals/english-today/article/abs/two-types-of-language-contact-involving-english-creoles/DD2FC19B55E041440F3BFC5235234968>
On Thu, Mar 24, 2022 at 3:15 PM Sebastian Nordhoff <
sebastian.nordhoff at glottotopia.de> wrote:
> Dear list members,
> there has been some discussion about "hit", "kill", "John", "Mary", and
> other usual suspects. Over the past months, we have worked on a corpus
> of all examples found in Language Science Press books. This corpus is
> now available in a beta version at imtvault.org. It contains 40648
> interlinear examples from 124 different languages and can be filtered
> along various criteria. For instance, we can search for John, Mary, or
> Peter.
>
> https://imtvault.org/?q=John: 266 hits
> https://imtvault.org/?q=Mary: 223 hits
> https://imtvault.org/?q=Peter: 232 hits
>
> We can look into the popularity of certain verbs:
>
> https://imtvault.org/?q=hit: 399 hits
> https://imtvault.org/?q=kill: 440 hits
> https://imtvault.org/?q=love: 181 hits
> https://imtvault.org/?q=kiss: 26 hits
> https://imtvault.org/?q=carry: 235 hits
>
> We have also retrieved semantic categories, so you get
> https://imtvault.org/?parententities[0]=Crop
> which gives you examples about tobacco, rice, barley, wheat and so on.
>
> Other categories which might be interesting:
> https://imtvault.org/?parententities[0]=Weapon: 89 hits
> https://imtvault.org/?parententities[0]=Hazard: 205 hits
>
> You can also filter for grammatical categories. In the examples in the
> corpus, 2808 have a plural morpheme in them, while 2116 have a singular
> morpheme. Accusative (1937) is more popular than genitive (1601), dative
> (1309) or nominative (1232).
>
> The content of the corpus is obviously skewed by the following criteria:
> 1) The coverage of the input books. Australia for instance is severely
> underrepresented.
> 2) The length of the input books. "A grammar of Japhug" is 1600 pages,
> so you are likely to get a lot of Japhug grammatical categories.
> 3) The source code of the books. We extract the examples from the tex
> files used to generate the pdf, and assume certain conventions. If a
> book author does not follow these conventions, we are not able to
> retrieve the examples.
>
> All this means that the corpus, despite its size, is still
> opportunistic. But it can maybe trigger some interesting ideas, which
> can be pursued further by a more systematic approach. We are also
> working on making the data available for machine queries so that you can
> import the corpus into R or similar and run your own statistics.
>
> There are still some rough edges here and there, but we will be working
> on ironing them out. If you have any suggestions or feature requests,
> feel free to contact me.
>
> Best wishes
> Sebastian (also on behalf of Thomas Krämer)
>
> PS: If you are wondering about the high frequency of Greek philosophers,
> they are all from our translation of Wackernagel's "On a law of
> Indo-European word order"
>
>
>
>
>
>
>
> _______________________________________________
> Lingtyp mailing list
> Lingtyp at listserv.linguistlist.org
> http://listserv.linguistlist.org/mailman/listinfo/lingtyp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20220324/00763041/attachment.htm>
More information about the Lingtyp
mailing list