MLU counts

Wed Feb 18 16:19:25 UTC 2004

Dear Yonata,
  Great to know you are working to accomplish just this.  I agree that the
ultimate goal of MLU is comparability within a language and
cross-linguistically.  However, I am just wondering whether it may be best
to consider these comparisons as two separate goals.  The great thing about
computer technology like CLAN is that, with the right set of dashes,
ampersands, and pluses (and perhaps liberal use of CHSTRING), you can
compute both sets of MLUs on a data set in either minutes or perhaps hours.
It may be that the two different MLUs (the rich language-internal one and
the leaner crossl-linguistic one) work equivalently for language-internal
comparisons and predictions, but my guess is that the language-internal MLU
will be best for language-internal purposes.
  I think your second point is that we need to recalibrate each
morphologically-rich language MLU for comparison with the English standard.
I agree completely with that.  There is no single algorithm that will work
across all languages for that task.

--Brian MacWhinney