Capital letters in compounds

Brian MacWhinney macw at CMU.EDU
Thu May 7 22:37:03 UTC 2009


Dear Annabelle,
    Leonid's answer regarding forms like Champs_Elysées was,  
unfortunately, not quite right.  Right now, the form Champs+Elysées is  
not acceptable, although Champs_Elysées is.  The same for  
Harvard_Square, which is acceptable and Harvard+Square, which is not.   
Of course, you can also write Harvard Square and Champs Elysées and  
they will work fine.  The reason for the avoidance of the plus is  
that, during the years between 1990 and 2000, people used the plus for  
virtually everything.  When we came to the work of trying to run MOR  
analyses on the corpora, the biggest headache we had was trying to  
basically undo all the overuse of the plus.  So, we reserved the plus  
for true compounds, such as black+board.  In a year or so, I plan to  
remove the use of the plus in forms like black+board.  This is going  
to be possible in English because we now have such an extensive list  
of possible compounds and we can then write a program to remove them  
all and rely on our extensive compound list.  But that is for the  
future.
    Right now, if you create a new lowercase compound, CHECK will not  
complain.  However, later on, MOR will complain because it will not be  
in the lexicon. However, words with initial capitals operate on  
different rules.  In English and French, we can count on initial  
capitalization to show us that a form is a proper noun.  This is no  
help in German or Chinese, but it works for English and French, so it  
makes sense to use this.  So, if you enter Champs_Elysées, you then  
tell MOR that this is a proper noun and the presence of the dash helps  
the reader out.  What you lose is the idea that this is a compound.   
Perhaps we can recover that notion some time in the future.  However,  
for now, with all the subtypes there are and all the competing  
constraints, I would actually recommend the form "Champs Elysées" as  
the best solution.  After nominal compounding for proper nouns easily  
blends into  nominal phrases, as in "West Thirty Second Boulevard"  .   
Treating this as a single compound with adjectival structure and such  
seems not as good as treating it as a complex noun phrase.

--Brian MacWhinney
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Chibolts mailing list