12.817, Qs: Ancient Chinese Taboo Words, Tokenization Ref

The LINGUIST Network linguist at linguistlist.org
Fri Mar 23 21:48:07 UTC 2001


LINGUIST List:  Vol-12-817. Fri Mar 23 2001. ISSN: 1068-4875.

Subject: 12.817, Qs: Ancient Chinese Taboo Words, Tokenization Ref

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Lydia Grebenyova, EMU		Jody Huellmantel, WSU
	James Yuells, WSU		Michael Appleby, EMU
	Marie Klopfenstein, WSU		Ljuba Veselinova, Stockholm U.

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.


* The LINGUIST Fund Drive for 2001 has begun!  We need your help to
* continue providing the LINGUIST services.  Please go to the URL
* http://linguistlist.org/funddrive.html and make a donation.


Editor for this issue: Karen Milligan <karen at linguistlist.org>
 ==========================================================================

We'd like to remind readers that the responses to queries are usually
best posted to the individual asking the question. That individual is
then  strongly encouraged to post a summary to the list.   This policy was
instituted to help control the huge volume of mail on LINGUIST; so we
would appreciate your cooperating with it whenever it seems appropriate.

=================================Directory=================================

1)
Date:  Wed, 21 Mar 2001 14:31:29 +0100
From:  "Gabriele Bugada" <noctes at hotmail.com>
Subject:  Ancient Chinese taboo words

2)
Date:  Fri, 23 Mar 2001 13:04:43 -0800
From:  Maite Taboada <maite at mindfuleye.com>
Subject:  tokenization reference

-------------------------------- Message 1 -------------------------------

Date:  Wed, 21 Mar 2001 14:31:29 +0100
From:  "Gabriele Bugada" <noctes at hotmail.com>
Subject:  Ancient Chinese taboo words

I am an italian student taking a course of Sociolinguistics.  I need
some informations about words which in ancient Chinese dialects were
considered taboo not just for their common-use meaning, but because
their pronunciation contained taboo words, exp. with sexual
meaning. E.g., I heard that there was a taboo word which meant an
animal but whose pronunciation was 'composed' by sounds meaning penis
and omosexual. I would like to know if this is true, what word (and
meaning what animal) was implied, and if other examples are known.
Can anyone help me?

Thank you in advance.


-------------------------------- Message 2 -------------------------------

Date:  Fri, 23 Mar 2001 13:04:43 -0800
From:  Maite Taboada <maite at mindfuleye.com>
Subject:  tokenization reference

I'm looking for references on how to do tokenization from scratch
(separate a stream into words, numbers, punctuation signs). I don't
want to have to explain the whole process, so I thought I'd just say
"we use a standard procedure, such as the one described in X".

Can anyone help me find appropriate references?

Thanks a lot,

- Maite

____
Maite Taboada, Senior Computational Linguist
MindfulEye.com Systems Inc.
http://www.MindfulEye.com

---------------------------------------------------------------------------
LINGUIST List: Vol-12-817



More information about the LINGUIST mailing list