[Corpora-List] Re. Concordancer for Chinese (Summary of reply)
Mike Scott
mike at lexically.net
Mon Oct 7 11:00:47 UTC 2002
As I understand it from Chinese CL linguists such as Scott Piao,
determining word boundaries in Chinese (and some other languages) is a
highly complex matter. The strategy I am using in WordSmith Tools version 4
is threefold:
a) assume that text in such languages has been pre-processed to insert
suitable word-boundary markers,
and where this has not been done,
b) allow the user to specify a list of common sequences for pre-processing
by WordSmith (inserting suitable word-boundary markers)
c) failing this, to equate "word" and "character".
Cheers -- Mike
At 17:15 07/10/2002 +0800, Linda Lin wrote:
>Dear All
>
>Thanks for your information about the concordancers for Chinese language. I
>have a question regarding the use of these concordancers. Do you think the
>recommended concordancers such as MonoConc Pro can only recognize individual
>characters, not actual "words" i.e. strings of characters, or they can in
>fact process actual "words"?
>
Mike Scott
Applied English Language Studies Unit
University of Liverpool
Liverpool L69 3BX, UK.
mike.scott at liv.ac.uk
http://www.lexically.net
http://www.liv.ac.uk/~ms2928
More information about the Corpora
mailing list