counting MLUm
Brian MacWhinney
macwhinn at hku.hk
Mon Jun 4 02:36:16 UTC 2001
On 6/4/01 12:44 AM, "Simon Huang" <pyh at hknet.com> wrote:
>
> I am currently a research student in Hong Kong, and I have encountered a
> technical problem in using CLAN and would like to obtain opinions from you.
>
> When I use the MLU program to calculate MLU for a set of data, I found that
> the program was able to count morphemes when the delimiter # is used in the
> main tier.
Simon, I must assume that you are talking about the use of # inside a word
to mark prefixes and not the use of # surrounded by spaces to mark pauses.
Of course, on the main line, you want to recognize not only the # for
prefixes, but also the dash - for suffixes, and the plus sign + for
compounds, and so on.
> However, if the morphological information is only encoded in the
> MOR tier only, I found that what I obtained is MLUw instead of MLUm. So, if
> the morphological information is only encoded in the mor tier rather than
> the main tier, is there any way to perform an MLUm count using CLAN? (Note:
> I'm now working on English data.)
>
Let's use the sample2.cha file in the LIB directory as our test file.
If you are using a command such as
mlu +t%mor sample2.cha
You will get 19 morphemes for MOT. This is MLUm
However, if you omit all the delimiters, you get MLUw with this command:
mlu -b-#~ +t%mor sample2.cha
You can use combinations of characters with the -b or +b to control what you
want. If you get something else, please tell me.
--Brian MacWhinney
More information about the Chibolts
mailing list