[Corpora-List] automatic search for orthographic recurring patterns
Shlomo Argamon
argamon at iit.edu
Wed Dec 8 15:42:00 UTC 2004
See our paper in COLING-04:
Shlomo Argamon, Navot Akiva, Amihood Amir, and Oren Kapah.
Efficient Unsupervised Recursive Word Segmentation Using Minimum
Description Length.
Proceedings of The 20th International Conference on Computational
Linguistics (COLING), August 2004.
Available at http://lingcog.iit.edu/pub.xml
-Shlomo-
MARC FRYD wrote:
> Hi,
> Perhaps someone on the List will be able to help me with the following
> datamining problem:
>
> Given a corpus of isolated lexical units or collocations, I would like
> to determine recurring orthographic patterns whether initial, i.e.
> "CARPO" (carpogenic, carpogenous, carpolite), final i.e. "IONALISM"
> (sensationalism, functionalism, etc.) , or internal, i.e. "CHRON"
> (synchony, synchronize, etc.).
> The output should be arranged so as to show respective productivity for
> each pattern.
> Important constraint: the various patterns will *not* be fed in
> initially but should be extracted as a result of the algorithm.
> I'll post a summary if I get several replies.
> Regards to all list members.
> Marc Fryd
>
More information about the Corpora
mailing list