omitting fillers
Brian MacWhinney
macw at cmu.edu
Tue Sep 28 20:57:15 UTC 2010
Dear ChiBolts,
I realize that I should have explained a bit further about why it is a good idea to omit fillers from the %mor line. Much of this has to do with the functioning of MOR, POST, and GRASP. Although they can set up sequential dependencies that include fillers, they are going to work more accurately without this additional "junk" in the middle of a syntactic chain or lexical environment. For example, the coding of a word as an adjective by MOR or as a modifier by GRASP can be determined by the fact that a noun follows or a determiner precedes. However, if the string is actually determiner + filler + adjective + filler + noun, then the power of the constituent structure environment is weakened and noise is added to the system. Simply coding pauses as um at fp doesn't help, because these forms will still end up on the %mor line. Of course, for the study of disfluencies, one wants to identify all the filled pauses. So, you could have &um at fp. Or you could settle on a list of standard filled pauses and search as needed for things like &um. Right now, the corpora use the second solution, which should work.
-- Brian MacWhinney
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com.
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com.
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en.
More information about the Chibolts
mailing list