[Corpora-List] Re. Concordancer for Chinese (Summary of reply)

Mike Scott mike at lexically.net
Mon Oct 7 11:00:47 UTC 2002


As I understand it from Chinese CL linguists such as Scott Piao,
determining word boundaries in Chinese (and some other languages) is a
highly complex matter. The strategy I am using in WordSmith Tools version 4
is threefold:

a) assume that text in such languages has been pre-processed to insert
suitable word-boundary markers,
and where this has not been done,
b) allow the user to specify a list of common sequences for pre-processing
by WordSmith (inserting suitable word-boundary markers)
c) failing this, to equate "word" and "character".

Cheers -- Mike

At 17:15 07/10/2002 +0800, Linda Lin wrote:
>Dear All
>
>Thanks for your information about the concordancers for Chinese language. I
>have a question regarding the use of these concordancers. Do you think the
>recommended concordancers such as MonoConc Pro can only recognize individual
>characters, not actual "words" i.e. strings of characters,  or they can in
>fact process actual "words"?
>

Mike Scott

Applied English Language Studies Unit
University of Liverpool
Liverpool L69 3BX, UK.

mike.scott at liv.ac.uk
http://www.lexically.net
http://www.liv.ac.uk/~ms2928



More information about the Corpora mailing list