[Corpora-List] Gross language detection

Jose Maria Gomez Hidalgo jmgomez at dinar.esi.uem.es
Wed Jan 8 16:31:33 UTC 2003


Sorry to all. I meant "gross language" i.e. swear words.

The paper [1] was:

[1] Ellen Spertus. "Smokey: Automatic Recognition of Hostile 
Messages,"Innovative Applications of Artificial Intelligence (IAAI) '97. 
Also presented at the Eighth Annual Meeting of the Society for Text and 
Discourse, July 31, 1998.
Available at: http://www.mills.edu/ACAD_INFO/MCS/SPERTUS/Smokey/smokey.ps


At 11:27 08/01/2003 +0100, you wrote:
>Dear all
>
>As a part of a classified ads posting system, a group of natural language 
>processing students supervised by me have to develop a gross language 
>detection system for the Spanish language. I do not know if there is any 
>work in this area (except maybe [1]).
>
>Dou you have ideas of how to do this?
>
>It seems rather heuristic, but my basic idea is:
>
>1. To build a dictionary of forbidden words (f**k, etc)
>2. To develop a set of regular expresions that allow to detect variations 
>of the forbiden words (e.g. if "xyzt" is a forbidden word, then we have to 
>detect "XyZt", "X_Y_Z_T" or little letter changes for slang - a "k" 
>instead a "c", etc).
>
>Thank you for your help
>
>         Jose Maria
>
>
>_______________________________________________________________________________
>
>Jose Maria Gomez Hidalgo
>Departamento de Inteligencia Artificial
>Universidad Europea de Madrid
>28670 - Villaviciosa de Odon - MADRID
>(+34) 912115670
>jmgomez at dinar.esi.uem.es
>http://www.esi.uem.es/~jmgomez/
>_______________________________________________________________________________
>
>La legislación española ampara el secreto de las comunicaciones. Este 
>correo electrónico es estrictamente confidencial y va dirigido 
>exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no 
>difunda ni copie la transmisión y nos lo notifique cuanto antes.
>
>Spanish law guarantees privacy in electronic communications. This 
>electronic transmission is strictly confidential and intended solely for 
>the addressee. If you are not the intended addressee, you are kindly 
>requested not to disclose nor to copy this transmission and to notify us 
>as soon as possible.
>
>



_______________________________________________________________________________

Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid
28670 - Villaviciosa de Odon - MADRID
(+34) 912115670
jmgomez at dinar.esi.uem.es
http://www.esi.uem.es/~jmgomez/
_______________________________________________________________________________

La legislación española ampara el secreto de las comunicaciones. Este 
correo electrónico es estrictamente confidencial y va dirigido 
exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda 
ni copie la transmisión y nos lo notifique cuanto antes.

Spanish law guarantees privacy in electronic communications. This 
electronic transmission is strictly confidential and intended solely for 
the addressee. If you are not the intended addressee, you are kindly 
requested not to disclose nor to copy this transmission and to notify us as 
soon as possible.



More information about the Corpora mailing list