Lemmatization on French mor tiers
Brian Macwhinney
macw at andrew.cmu.edu
Fri Nov 24 14:01:40 UTC 2023
Dear Nicola,
This is the first I had heard about this problem, but I can definitely see how it could have been a problem with French MOR. As I wrote in earlielr postings to chibolts and as described at https://talkbank.org/morgrams/ we have been transitioning to use of the Universal Dependency taggers for languages in CHILDES other than English. For French, this means replacing the grmmar that Christophe Parisse and I created with a UD grammar. I have used the new grammar to re-tag all French corpora in CHILDES. The UD grammar doesn’t have the problem you described. I am attaching a sample file. You can use this new grammar by installing Batchaling from https://github.com/talkbank, but this involves installing Anaconda. We will eventually make this easier, perhaps through a web service.
— Brian MacWhinney
Teresa Heinz Professor of Cognitive Psychology,
Language Technologies and Modern Languages, CMU
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/DF94E4F2-1011-47D5-A8DF-6A023913C358%40andrew.cmu.edu.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.cha
Type: text/chat
Size: 1189 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20231124/ecba9de9/attachment.bin>
-------------- next part --------------
> On Nov 23, 2023, at 6:59 PM, Nicola Phillips <nicolakphillips at gmail.com> wrote:
>
> Hello,
>
> I was reading a chapter by Treffers-Daller today (citation below) and wondered if a solution had ever been found for the problem described as follows:
>
> "the French mor tier distinguishes different subcategories of verb forms (infinitival, participial, progressive and other forms) in the information to the left of the pipe separator (|) which separates the syntactic category information from the word itself.... This means that FREQ counts these different forms of trouver as different types rather than as different tokens of the type trouver, which results in inflated indices of lexical richness. This problem also exists, but to a lesser extent, for other syntactic categories such as pronouns. Using switches such as s”*-% %”, which tell CLAN to ignore form variants, does not solve the problem, because these switches only look at information after the pipe separator"
>
> Thanks so much for your input!
> Nicky Phillips
>
> Treffers-Daller, J. (2009). Language Dominance and Lexical Diversity: How Bilinguals and L2 Learners Differ in their Knowledge and Use of French Lexical and Functional Items. In B. Richards, M. H. Daller, D. D. Malvern, P. Meara, J. Milton, & J. Treffers-Daller (Eds.), Vocabulary Studies in First and Second Language Acquisition: The Interface Between Theory and Application (pp. 74-90). Palgrave Macmillan UK. https://doi.org/10.1057/9780230242258_5
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/4784d587-912e-4467-8a18-33af28cc1f73n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/DF94E4F2-1011-47D5-A8DF-6A023913C358%40andrew.cmu.edu.
More information about the Chibolts
mailing list