<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Thanks, this is great, Sebastian!</div>


<div> </div>


<div>Best,</div>


<div> </div>


<div>Andi</div>


<div> 

<div> 

<div name="quote" style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">

<div style="margin:0 0 10px 0;"><b>Gesendet:</b> Donnerstag, 24. März 2022 um 08:14 Uhr<br/>

<b>Von:</b> "Sebastian Nordhoff" <sebastian.nordhoff@glottotopia.de><br/>

<b>An:</b> "lingtyp@listserv.linguistlist.org" <lingtyp@listserv.linguistlist.org><br/>

<b>Betreff:</b> [Lingtyp] Examplology: imtvault.org</div>


<div name="quoted-content">Dear list members,<br/>

there has been some discussion about "hit", "kill", "John", "Mary", and<br/>

other usual suspects. Over the past months, we have worked on a corpus<br/>

of all examples found in Language Science Press books. This corpus is<br/>

now available in a beta version at imtvault.org. It contains 40648<br/>

interlinear examples from 124 different languages and can be filtered<br/>

along various criteria. For instance, we can search for John, Mary, or<br/>

Peter.<br/>

<br/>

<a href="https://imtvault.org/?q=John" target="_blank">https://imtvault.org/?q=John</a>: 266 hits<br/>

<a href="https://imtvault.org/?q=Mary" target="_blank">https://imtvault.org/?q=Mary</a>: 223 hits<br/>

<a href="https://imtvault.org/?q=Peter" target="_blank">https://imtvault.org/?q=Peter</a>: 232 hits<br/>

<br/>

We can look into the popularity of certain verbs:<br/>

<br/>

<a href="https://imtvault.org/?q=hit" target="_blank">https://imtvault.org/?q=hit</a>: 399 hits<br/>

<a href="https://imtvault.org/?q=kill" target="_blank">https://imtvault.org/?q=kill</a>: 440 hits<br/>

<a href="https://imtvault.org/?q=love" target="_blank">https://imtvault.org/?q=love</a>: 181 hits<br/>

<a href="https://imtvault.org/?q=kiss" target="_blank">https://imtvault.org/?q=kiss</a>: 26 hits<br/>

<a href="https://imtvault.org/?q=carry" target="_blank">https://imtvault.org/?q=carry</a>: 235 hits<br/>

<br/>

We have also retrieved semantic categories, so you get<br/>

<a href="https://imtvault.org/?parententities" target="_blank">https://imtvault.org/?parententities</a>[0]=Crop<br/>

which gives you examples about tobacco, rice, barley, wheat and so on.<br/>

<br/>

Other categories which might be interesting:<br/>

<a href="https://imtvault.org/?parententities" target="_blank">https://imtvault.org/?parententities</a>[0]=Weapon: 89 hits<br/>

<a href="https://imtvault.org/?parententities" target="_blank">https://imtvault.org/?parententities</a>[0]=Hazard: 205 hits<br/>

<br/>

You can also filter for grammatical categories. In the examples in the<br/>

corpus, 2808 have a plural morpheme in them, while 2116 have a singular<br/>

morpheme. Accusative (1937) is more popular than genitive (1601), dative<br/>

(1309) or nominative (1232).<br/>

<br/>

The content of the corpus is obviously skewed by the following criteria:<br/>

1) The coverage of the input books. Australia for instance is severely<br/>

underrepresented.<br/>

2) The length of the input books. "A grammar of Japhug" is 1600 pages,<br/>

so you are likely to get a lot of Japhug grammatical categories.<br/>

3) The source code of the books. We extract the examples from the tex<br/>

files used to generate the pdf, and assume certain conventions. If a<br/>

book author does not follow these conventions, we are not able to<br/>

retrieve the examples.<br/>

<br/>

All this means that the corpus, despite its size, is still<br/>

opportunistic. But it can maybe trigger some interesting ideas, which<br/>

can be pursued further by a more systematic approach. We are also<br/>

working on making the data available for machine queries so that you can<br/>

import the corpus into R or similar and run your own statistics.<br/>

<br/>

There are still some rough edges here and there, but we will be working<br/>

on ironing them out. If you have any suggestions or feature requests,<br/>

feel free to contact me.<br/>

<br/>

Best wishes<br/>

Sebastian (also on behalf of Thomas Krämer)<br/>

<br/>

PS: If you are wondering about the high frequency of Greek philosophers,<br/>

they are all from our translation of Wackernagel's "On a law of<br/>

Indo-European word order"<br/>

<br/>

<br/>

<br/>

<br/>

<br/>

<br/>

<br/>

_______________________________________________<br/>

Lingtyp mailing list<br/>

Lingtyp@listserv.linguistlist.org<br/>

<a href="http://listserv.linguistlist.org/mailman/listinfo/lingtyp" target="_blank">http://listserv.linguistlist.org/mailman/listinfo/lingtyp</a></div>

</div>

</div>

</div></div></body></html>