Capital letters in written L2 data?

Leonid Spektor spektor at andrew.cmu.edu
Wed May 20 18:16:45 UTC 2009


Riikka and Brian,

    You can not use sf.cut to control capitalized words, because it will
create complications for other features in mor. But, it is easily possible
to add an option to mor to convert capitalized word to lower case word the
same way as lowcase command does. Or you can use  lowcase command which has
more options to specify which words should be left capitalized and which
should be converted to lower case.

Leonid.


On 20-05-09 09:04, "Brian MacWhinney" <macw at cmu.edu> wrote:

> 
> Dear Riikka,
>      Let me check with Leonid on this one.  In theory, it should be
> possible to use the sf.cut file in MOR to control the linkage of
> capitals to proper nouns, but I think this may not work in the default
> case.
>     However, my guess is that you are only really having trouble with
> the words at the beginnings of sentences that are capitalized.  If you
> want MOR and everything to work right, you really should run LOWCASE
> with the +c option.  To further control this, you can use the +d
> option.  Once this is done, MOR will run more smoothly.  It would be
> great to have a Finnish MOR too, but I suppose the bulk of your L2
> data is in English anyway.
> 
> --Brian MacWhinney
> 
> On May 20, 2009, at 2:14 PM, Riikka wrote:
> 
>> 
>> Dear all,
>> 
>> We're using a somewhat modified form of CHAT to transcribe Finnish/
>> English L2 written data (modified for coding purposes and because the
>> system was originally developed for spoken language data).
>> 
>> Although we cannot use MOR in CLAN for Finnish L2 data,  we're going
>> to try to use it for English L2 data.
>> 
>> The problem is that in our transcribed data set we've retained upper
>> case letters exactly as they were used in the original hand-written
>> data.  Of course, MOR interprets all words with the initial letter in
>> upper case as proper nouns.  I was wondering, is there a clever way to
>> make MOR ignore at least  the sentence initial upper case letters? Or
>> do we just have to prepare another data set, with upper case letters
>> edited out?
>> 
>> Best,
>> Riikka from Jyvaskyla, Finland
>> 
>>> 
>> 
> 
> 
> > 
> 



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Chibolts mailing list