Capital letters in compounds
Brian MacWhinney
macw at CMU.EDU
Thu May 7 22:37:03 UTC 2009
Dear Annabelle,
Leonid's answer regarding forms like Champs_Elysées was,
unfortunately, not quite right. Right now, the form Champs+Elysées is
not acceptable, although Champs_Elysées is. The same for
Harvard_Square, which is acceptable and Harvard+Square, which is not.
Of course, you can also write Harvard Square and Champs Elysées and
they will work fine. The reason for the avoidance of the plus is
that, during the years between 1990 and 2000, people used the plus for
virtually everything. When we came to the work of trying to run MOR
analyses on the corpora, the biggest headache we had was trying to
basically undo all the overuse of the plus. So, we reserved the plus
for true compounds, such as black+board. In a year or so, I plan to
remove the use of the plus in forms like black+board. This is going
to be possible in English because we now have such an extensive list
of possible compounds and we can then write a program to remove them
all and rely on our extensive compound list. But that is for the
future.
Right now, if you create a new lowercase compound, CHECK will not
complain. However, later on, MOR will complain because it will not be
in the lexicon. However, words with initial capitals operate on
different rules. In English and French, we can count on initial
capitalization to show us that a form is a proper noun. This is no
help in German or Chinese, but it works for English and French, so it
makes sense to use this. So, if you enter Champs_Elysées, you then
tell MOR that this is a proper noun and the presence of the dash helps
the reader out. What you lose is the idea that this is a compound.
Perhaps we can recover that notion some time in the future. However,
for now, with all the subtypes there are and all the competing
constraints, I would actually recommend the form "Champs Elysées" as
the best solution. After nominal compounding for proper nouns easily
blends into nominal phrases, as in "West Thirty Second Boulevard" .
Treating this as a single compound with adjectival structure and such
seems not as good as treating it as a complex noun phrase.
--Brian MacWhinney
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---
More information about the Chibolts
mailing list