12.817, Qs: Ancient Chinese Taboo Words, Tokenization Ref
The LINGUIST Network
linguist at linguistlist.org
Fri Mar 23 21:48:07 UTC 2001
LINGUIST List: Vol-12-817. Fri Mar 23 2001. ISSN: 1068-4875.
Subject: 12.817, Qs: Ancient Chinese Taboo Words, Tokenization Ref
Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
Andrew Carnie, U. of Arizona <carnie at linguistlist.org>
Reviews (reviews at linguistlist.org):
Simin Karimi, U. of Arizona
Terence Langendoen, U. of Arizona
Editors (linguist at linguistlist.org):
Karen Milligan, WSU Naomi Ogasawara, EMU
Lydia Grebenyova, EMU Jody Huellmantel, WSU
James Yuells, WSU Michael Appleby, EMU
Marie Klopfenstein, WSU Ljuba Veselinova, Stockholm U.
Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
Home Page: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
* The LINGUIST Fund Drive for 2001 has begun! We need your help to
* continue providing the LINGUIST services. Please go to the URL
* http://linguistlist.org/funddrive.html and make a donation.
Editor for this issue: Karen Milligan <karen at linguistlist.org>
==========================================================================
We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then strongly encouraged to post a summary to the list. This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.
=================================Directory=================================
1)
Date: Wed, 21 Mar 2001 14:31:29 +0100
From: "Gabriele Bugada" <noctes at hotmail.com>
Subject: Ancient Chinese taboo words
2)
Date: Fri, 23 Mar 2001 13:04:43 -0800
From: Maite Taboada <maite at mindfuleye.com>
Subject: tokenization reference
-------------------------------- Message 1 -------------------------------
Date: Wed, 21 Mar 2001 14:31:29 +0100
From: "Gabriele Bugada" <noctes at hotmail.com>
Subject: Ancient Chinese taboo words
I am an italian student taking a course of Sociolinguistics. I need
some informations about words which in ancient Chinese dialects were
considered taboo not just for their common-use meaning, but because
their pronunciation contained taboo words, exp. with sexual
meaning. E.g., I heard that there was a taboo word which meant an
animal but whose pronunciation was 'composed' by sounds meaning penis
and omosexual. I would like to know if this is true, what word (and
meaning what animal) was implied, and if other examples are known.
Can anyone help me?
Thank you in advance.
-------------------------------- Message 2 -------------------------------
Date: Fri, 23 Mar 2001 13:04:43 -0800
From: Maite Taboada <maite at mindfuleye.com>
Subject: tokenization reference
I'm looking for references on how to do tokenization from scratch
(separate a stream into words, numbers, punctuation signs). I don't
want to have to explain the whole process, so I thought I'd just say
"we use a standard procedure, such as the one described in X".
Can anyone help me find appropriate references?
Thanks a lot,
- Maite
____
Maite Taboada, Senior Computational Linguist
MindfulEye.com Systems Inc.
http://www.MindfulEye.com
---------------------------------------------------------------------------
LINGUIST List: Vol-12-817
More information about the LINGUIST
mailing list