counting MLUm

Mon Jun 4 02:36:16 UTC 2001

On 6/4/01 12:44 AM, "Simon Huang" <pyh at hknet.com> wrote:

>
> I am currently a research student in Hong Kong, and I have encountered a
> technical problem in using CLAN and would like to obtain opinions from you.
>
> When I use the MLU program to calculate MLU for a set of data, I found that
> the program was able to count morphemes when the delimiter # is used in the
> main tier.

Simon, I must assume that you are talking about the use of # inside a word
to mark prefixes and not the use of # surrounded by spaces to mark pauses.
Of course, on the main line, you want to recognize not only the # for
prefixes, but also the dash - for suffixes, and the plus sign + for
compounds, and so on.

> However, if the morphological information is only encoded in the
> MOR tier only, I found that what I obtained is MLUw instead of MLUm.  So, if
> the morphological information is only encoded in the mor tier rather than
> the main tier, is there any way to perform an MLUm count using CLAN? (Note:
> I'm now working on English data.)
>

Let's use the sample2.cha file in the LIB directory as our test file.

If you are using a command such as

mlu +t%mor sample2.cha

You will get 19 morphemes for MOT.  This is MLUm

However, if you omit all the delimiters, you get MLUw with this command:

mlu -b-#~ +t%mor sample2.cha

You can use combinations of characters with the -b or +b to control what you
want.  If you get something else, please tell me.

--Brian MacWhinney