French MOR

Mon Jan 16 18:54:20 UTC 2006

Dear Christophe and Info-CHILDES,

      OK, I see now more clearly how this is organized.  Thanks for  
the clarifications. The distance between the French and Spanish/ 
Italian systems is far less than I saw at first.  And the coverage  
for French is far greater than I was estimating.  To test this, I ran  
the French MOR on the Champaud data and found only 363 "missing"  
forms.  Usually, we have twice that number  for a new corpus.   
Moreover, virtually none of these "missing" form are truly missing.   
Rather they involve a few spelling errors and issues with the proper  
treatment of abbreviations (qu'il, s'adapte, t'apporte, d'aller).   
Also, the proper representation of compounds is another tricky  
thing.  In any case, the distance to be traveled to getting the  
French data fully analyzed by MOR and POST should not be anywhere  
near as far as I originally thought.
     Hopefully we will be able to review some of these details next  
week in Paris.  Many thanks for the clarifications.

--Brian MacWhinney

On Jan 16, 2006, at 6:36 AM, Christophe Parisse wrote:

> Well I beg to differ !
>
> First I used c-rules as there are many regulars words in French and  
> it was
> easier to use c-rules than not to used them.
>
> For example, there are 25144 "words" in the v.cut file. Out of these
> "words", 11147 are the root of 1st group verbs (the most frequent  
> regular
> verbs in French) and 240 are 2nd group verbs. All these roots allow to
> analyse something like 45 different forms, thanks to c-rules  
> (something like
> 510615 different forms in full forms). Now, there are something  
> like 13757
> words in v.cut which ARE full forms but these corresponds to  
> (only!) 305
> irregular verbs which have something like 45 different forms each  
> and they
> are much to irregular for c-rules to be of much use.
>
> For nouns and adjectives, I did the same, which is generating  
> automatically
> the plurals with 's' and the feminine form of adjectives with "e".
>
> Second, but most important, MOR for French does the SAME thing as  
> MOR for
> English.  Just two examples:
>
> ENGLISH
>
> @Begin
> *CHI:   plays
> %mor:	v|play-3S^n|play-PL
> *CHI:   playing
> %mor:	part|play-PROG
> *CHI:   oxen
> %mor:	n|ox&PL
> *CHI:   geese
> %mor:	n|goose&PL
> *CHI:   problems
> %mor:	n|problem-PL
> @End
>
> FRENCH:
>
> @Begin
> *CHI:	jouent
> %mor:	v|jouer-SUBJV:PRES&_3PV^v|jouer&PRES&_3PV
> *CHI:	remises
> %mor:
> v:pp|remettre&_FEM&_PL^n|remise&_FEM-_PL^v|remiser- 
> SUBJV:PRES&_2SV^v|remiser
> &PRES&_2SV
> *CHI:	allumees
> %mor:	v:pp|allumer&_FEM&_PL
> *CHI:	jouant
> %mor:	v:prog|jouer
> *CHI:	chevaux
> %mor:	n|cheval&_MASC&_PL
> *CHI:	elephants
> %mor:	n|elephant&_MASC-_PL
> *CHI:	fille
> %mor:	n|fille&_FEM
> @End
>
> However, I confess that I made an error when generating the list of
> exceptions because I coded some words which are regular using the  
> "&" sign
> instead of "-". This especially is true for feminines forms which  
> are all
> coded with "&" whereas many are regular. But I can check this and  
> change the
> signs if necessary, either in the full form file or by coding a new  
> rule.
> Also, some verbs of the 3rd group could be considered as regular.  
> Well these
> could be changed too, but there could be disagreement about the  
> list of
> regular 3rd group verbs.
> Finally, one could choose a different notation for infinitives and
> participles. I coded them in the main category, instead of
> using -INF, -PROG, etc. This could be easilly changed if necessary.
>
> One final remark. There are around 32,000 roots in MOR for French,  
> which
> correspond to close to 600,000 full forms. It seems to me this far  
> from
> incomplete.
>
> Christophe Parisse
>
>> -----Message d'origine-----
>> De : info-childes at mail.talkbank.org
>> [mailto:info-childes at mail.talkbank.org]De la part de Brian MacWhinney
>> Envoye : dimanche 15 janvier 2006 17:07
>> A : info-childes at mail.talkbank.org
>> Objet : French MOR
>>
>>
>> Dear Colleagues,
>>     Work on the application of MOR to the French corpora in CHILDES
>> has lagged a bit, despite the availability of a fairly complete
>> lexicon provided by Christophe Parisse.  In part, this is because the
>> Parisse French MOR system was constructed to use full form entries,
>> rather than the system of arules and crules used for other
>> languages.  It would be possible to either continue constructing
>> French MOR in this full-form format or to shift to using the analytic
>> framework.  Before beginning on this work, I wanted to check to see
>> if anyone in the CHILDES community had done any work extending the
>> current French MOR grammar.  I want to make sure we are not about to
>> reinvent the wheel.  Many thanks.
>>
>> --Brian MacWhinney, CMU
>>
>>
>