[Lingtyp] Deriving Tns-Asp-Mood orders and Greenberg’s Universal 20 with n-grams

Stela Manova stela.manova at univie.ac.at
Wed Jul 14 13:58:08 UTC 2021


**

*Dear LINGTYP readers, *

*

Some time ago, after an email exchange with Mattis List on this list on 
how Google services process natural language, I was asked to comment on 
recent NLP research, esp. on NLP without grammar. I refused for a number 
of reasons, personal and professional alike. However, for the past one 
and a half years, I have had the chance to experiment with n-grams and 
am now ready to demonstrate, based on sequences of elements considered 
important in linguistics such as the attested and unattested TAM orders 
and Greenber’s Universal 20 and its exceptions, how NLP without grammar 
works. Since this research is directly relevant to linguistic typology, 
I would be grateful to receive input by typologists. The abstract of the 
paper follows:


  The linear order of elements in prominent linguistic sequences:

Deriving Tns-Asp-Mood orders and Greenberg’s Universal 20 with n-grams


Stela MANOVA

stela.manova at univie.ac.at <mailto:stela.manova at univie.ac.at>


Current NLP research uses neither linguistically annotated corpora nor 
the traditional pipeline of linguistic modules, which raises questions 
about the future of linguistics. Linguists who have tried to crack the 
secrets of deep learning NLP models, including BERT(a bidirectional 
transformer-based ML technique employed for Google Search), have had as 
their ultimate goal to show that deep nets make linguistic 
generalizations. I decided for an alternative approach. To check whether 
it is possible to process natural language without grammar, I developed 
a very simple model, the End-to-end N-Gram Model (EteNGraM), that 
elaborates on the standard n-grammodel. EteNGraM, at a very basic level, 
imitates current NLP research by handling semantic relations without 
semantics. Like in NLP, I pre-trained the model with the orders of the 
TAM markers in the verbal domain, fine-tuned it, and then applied it for 
derivation of Greenberg’s Universal 20 and its exceptions in the nominal 
domain. Although EteNGraM is ridiculously simple and uses only bigrams 
and trigrams, it successfully derives the attested and unattested 
patterns in Cinque (2005) “Deriving Greenberg's Universal 20 and Its 
Exceptions”, Linguistic Inquiry 36, and Cinque (2014) “Again on Tense, 
Aspect, Mood Morpheme Order and the “Mirror Principle”.” In Functional 
Structure from Top to Toe: The Cartography of Syntactic Structures9. 
EteNGraMalso makes fine-grained predictions about preferred and 
dispreferred patterns across languages and reveals novel aspects of the 
organization of the verbal and nominal domain. To explain 
EteNGraM'shighly efficient performance, I address issues such as: 
complexity of data versus complexity of analysis; structure building by 
linear sequences of elements and by hierarchical syntactic trees; and 
how linguists can contribute to NLP research.


The full text is available at: https://ling.auf.net/lingbuzz/006082 
<https://ling.auf.net/lingbuzz/006082>


Many people believe that numbers serve primarily for counting -- money 
and things that can be bought with money. Yet, against any logic, the 
noun money is uncountable in e.g. modern English; and as usual in 
linguistics, the problem becomes even more complicated if one looks at 
languages other than English, e.g. in my mother tongue, Bulgarian, money 
is pluralia tantum. I hope that my math-oriented research reveals the 
fascinating world of numbers in a more convincing way. :)


Best wishes,


Stela

*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20210714/eccd94bb/attachment.htm>


More information about the Lingtyp mailing list